Open jrohnerx opened 3 years ago
There are two separate things that I want to address here: the feature request and the .tif files not being captured in the search results that you mentioned at the end.
Tif files not being captured in the search results: It sounds like you added .tif file format to the list of indexedFileNameExtensions, but you still aren't seeing them in your search results. Here's what I suspect happened. If you have already run the indexer once on your blob storage, the indexer keeps track of a high water mark (meaning the timestamp of when it ran and what it's processed, so it knows everything created before that is already processed and in the index). So, I'm guessing that the high water mark is already at a time past when all of your tifs were added to blob storage, and that is why the indexer isn't picking them up. So, there are two ways to deal with this:
Feature request: I'm not sure if this is do-able, but it will require a little work. I won't have time to get to this in the short term, so let me explain how I would do it and you can take a stab at it.
Now, the part that I need to check with the QnA Maker team on is whether that output can be processed by QnA Maker correctly. Their old model needed structure around the question and answer pairs, and I'm not sure if the OCR skill output would have that structure. I think they have a newer model where less structure is needed, but let me verify.
Thank you, @jennifermarsman, for the response and information. I think given the circumstances, my best bet is to run scripts to convert our TIF files to OCR'd PDFs (done via our document management system) and output them to a share where the indexer can pick them up.
Right now I have the legwork for this done, but unfortunately, the files are in an Azure File Share which isn't a publicly supported index location at this time. I've heard that it's in a form of a closed beta/test. Do you know who is in charge of allowing customers to participate in that?
@jrohnerx yes, it's in preview. Drop me an email at jennmar@microsoft.com and I can hook you up.
This issue is for a: (mark with an
x
)Minimal steps to reproduce
Any log messages given by the failure
Expected/desired behavior
OS and Version?
Versions
Mention any other details that might be useful
I was hoping to add .tif support to this as we have thousands of TIF files with text that could easily be OCR'd and search on. Is this possible? I added the .tif file format to the Indexer json as mentioned in the guide/instructions, but it doesn't seem to capture those files in the search results after waiting a while.