FormRecognizerClient - no way to get raw Read Results

WraithNath commented 3 years ago

When using FormRecognizerClient.StartRecognizeContent is returns a PageColleciton Object - which is sufficient for document labelling, however, to train a model you need the doc.labels.json, doc.ocr.json and a fields.json file.

the page collection can be serialized to Json but it is not in the same format required by the form recognizer to train a model.

looking at the source code for the for the SDK, the ReadResults are converted to a Page colleciton so lots of the information is lost or in a different format. There needs to be an SDK method to get the read results back from https://{endpoint}/formrecognizer/v2.0/layout/analyze so the json can be stored in the blob along with the ocr.json so a model can be trained against it.

maririos commented 3 years ago

Hi @WraithNath ,

The best way to get your training data in the format you need is to use the Labeling tool provided by the Form Recognizer Service. Here you can find more information about how to use it: https://docs.microsoft.com/en-us/azure/applied-ai-services/form-recognizer/label-tool?tabs=v2-1

WraithNath commented 3 years ago

Hello,

Thanks for the reply, I have used the training tool and its very good. I'm using it and comparing the json files generated from it with the ones I am am generating from my own application. The fields.json and labels.json are now a perfect match, its just the ocr.json file that is a problem now (I have the page collection that is returned from StartRegognizeContent but this is not enough to populate the ocr.json file with.

I am creating a customer facing windows application (well its part of another application). I cant really expect users to use an external web application to label the documents when they should be able to do it all from within the one application.

I have created the winforms UI for the labelling which works in the same way as the react web app. I just need a way of generating the ocr.json file.

I have had a look through the source code of the SDK and it looks like the conversion to a PageCollection happens in the RecognizeContentOperation.WaitForCompletion.

would it be possible to somehow create my own RegognizeContentOperation that returns the correct type and I can worry about converting to a page collection. Most of the classes it looks like I need have 'internal' modifiers on them so I cant get access to them (eg, the rest service class that does the actual calls to the web service).

Happy to try out some idea's if you have any, its this ocr.json file generation that is stopping progress now. I may be able to craft a lot in the file from the page collection, but im not going to be able to set properties such as the version it was created from for example.

maririos commented 3 years ago

Thanks for the explanation. It def looks like the UI tool won't be enough.

What about getting the raw response from the service and parsing from there the information you need so you can use that as the ocr.json file?

Something like (but more stable for production environment):

Response<FormPageCollection> response = await client.StartRecognizeContentAsync(stream).WaitForCompletionAsync();

var contentStream = response.GetRawResponse().ContentStream;
contentStream.Position = 0;
var stringResult = new StreamReader(contentStream).ReadToEnd();
// get the information you need from stringResult ?

WraithNath commented 3 years ago

Hi @maririos,

Thanks, Ill try that out later and see whether the original result in in the reponse or not.

fingers crossed!

WraithNath commented 3 years ago

@maririos ,

I think that is going to work! Ill need to create some models to convert the json to in order to return it but the output seems to match the ocr file that the training tool generates.

Thanks so much for your help, I have only been using the form recognizer for a couple of weeks and really impressed with it so far.

Kind regards,

Nathan

maririos commented 3 years ago

Yay!! I am so glad that this approach is going to unblock you. Let me know if you have more questions or feedback.

WraithNath commented 3 years ago

I have got this one working now, I save the raw analyze layout result Json as you suggested in a file, along with the labels and fields json files in the blob along with another 4 files labeleld with the FOTT and the model trained succesfully :).

thanks again for your help and great work on the SDK!

Azure / azure-sdk-for-net

FormRecognizerClient - no way to get raw Read Results #23874