MicrosoftDocs / azure-docs

Open source documentation of Microsoft Azure
https://docs.microsoft.com/azure
Creative Commons Attribution 4.0 International
10.28k stars 21.46k forks source link

Read Batch Text: The Bounding box is returning in Inch unit instead of pixel which is causing the exception JsonReaderException: Input string '5.6703' is not a valid integer. Path 'recognitionResults[0].lines[0].boundingBox[0]', line 1, position 156. #27365

Closed ytthuan closed 5 years ago

ytthuan commented 5 years ago

https://docs.microsoft.com/en-us/azure/cognitive-services/computer-vision/quickstarts/csharp-hand-text this article mentioned that the output is returning the bounding box in pixel unit which is ok, but actually the output from BATCH READ TEXT is returning a double value in Inch unit.

"page": 2, "clockwiseOrientation": 0.01, "width": 8.2633, "height": 11.68, "unit": "inch", "lines": [ { "boundingBox": [ 0.4519, 0.5475, 1.4418, 0.5399, 1.4525, 0.6235, 0.4627, 0.6311 ],

do we have any way to choose the correct unit.


Document Details

Do not edit this section. It is required for docs.microsoft.com ➟ GitHub issue linking.

AmanGarg-MSFT commented 5 years ago

@ytthuan Thank you for the feedback! We are investigating this and will get back to you shortly.

RohitMungi-MSFT commented 5 years ago

@ytthuan An interesting observation of the unit. Could you please let us know which type of file are you using for your testing? The supported file types as per the API reference are:

Input requirements:

I used this sample image and ran the solution in this quickstart which gave me the correct output.

prescription

Output: vision_output.txt

I also tested the .pdf format file and it returned me unit in "inch". So, I believe that this might be your input file format. Output of my result is attached. vision_output_pdf.txt

@PatrickFarley Could you please let us know if the default unit returned for .pdf files is "inch" as per the input requirements? I couldn't find a parameter that could be passed to change the unit in the API reference page.

Could you please update the document if the unit value returned is different for different file types?

ytthuan commented 5 years ago

@ytthuan An interesting observation of the unit. Could you please let us know which type of file are you using for your testing? The supported file types as per the API reference are:

Input requirements:

  • Supported image formats: JPEG, PNG, BMP, PDF and TIFF.
  • For PDF and TIFF, up to 200 pages are processed.
  • For free tier subscribers, only the first 2 pages are processed.
  • Image file size must be less than 20 MB.
  • Image dimensions must be at least 50 x 50 pixels and at most 4200 x 4200 pixels. PDF dimensions must be at most 17 x 17 inches, corresponding to Legal or A3 paper sizes and smaller

I used this sample image and ran the solution in this quickstart which gave me the correct output.

prescription

Output: vision_output.txt

I also tested the .pdf format file and it returned me unit in "inch". So, I believe that this might be your input file format. Output of my result is attached. vision_output_pdf.txt

@PatrickFarley Could you please let us know if the default unit returned for .pdf files is "inch" as per the input requirements? I couldn't find a parameter that could be passed to change the unit in the API reference page.

Could you please update the document if the unit value returned is different for different file types?

hi Rohit, yeah, pdf file. the value in double - as unit is inch will through the exception if you are using ComputerVisionClient libaries, the result class is ReadOperationResult which is requiring an integer number in bounding box value I believe, and the exception JsonReaderException: Input string '5.6703' is not a valid integer. Path ... will throw which the first line of ReadOperationResult.

I am expecting that MS has the option to choose the unit return or at least the class ReadOperationResult can accept double value to avoid exception.

RohitMungi-MSFT commented 5 years ago

@PatrickFarley Could you help to address this issue with product group? It looks like bounding box is a list of integers and in case of PDF documents it returns unit as inches and throws an exception.

PatrickFarley commented 5 years ago

Let me first loop in the product team; I believe the intent was for the units to depend on file type and not to be configurable, so if PDFs throw an exception because of the units they use, that is something the team will need to sort out.

ytthuan commented 5 years ago

Hi.

Do we have any update on this please.

AggressiveWaffle commented 5 years ago

I have the same problem is there a fix for this ? Or do i just use the http client

Cpcrook commented 5 years ago

@PatrickFarley Any update on this? Seems like a pretty glaring oversight that could have been solved by a unit test.

Trying this implementation for OCRing since your documentation says the RecognizeText API is going to be deprecated in favor of the Read API, which is seemingly broken using the .NET SDK at the moment.

Edit: if anyone is looking for a quick fix to this like @AggressiveWaffle my fork of the repo contains a mediocre hack fix until the MSFT product team gets their act together. Swaps the int fields causing the issue out for dynamics. Like I said, hacky, but functional for trial purposes. Would advise against production use, obviously.

abhiseky93 commented 5 years ago

I am using https://westcentralus.api.cognitive.microsoft.com/vision/v2.0/read/core/asyncBatchAnalyze Does boundingbox gives { X top left, Y top left , X top right , Y top right, X bottom right , Y bottom right , X bottom left , Y bottom left } in response ? Need to find x,y,height and width please suggest

Cpcrook commented 5 years ago

@abhiseky93 x,y, height and width of what?

If you're talking about the bboxes, you could be able to calculate that with:

x = x top left, y = y top left, height = MAX((y bottom left - y bottom right), (y top right - y top left)) width = MAX((x top right - x top left), (x bottom right - x bottom left))

Not sure if I'm understanding the question. That should get you rough coords and dimensions for bounding boxes, assuming top-left indexing of coordinates. The max is used to get the largest outer dimensions in the cases where a bounding box is not necessarily rectangular in shape, but more of a trapezoid.

KellyDF commented 5 years ago

Thanks for reaching out. PDFs return the bounding box in inches while images return the bounding box in pixels. This can be found in response section here: https://westus.dev.cognitive.microsoft.com/docs/services/5adf991815e1060e6355ad44/operations/5be108e7498a4f9ed20bf96d

tchristiani commented 5 years ago

Resolution provided.

please-close

Cpcrook commented 5 years ago

@KellyDF this isn't resolved with the C# SDK at all - getting results of a multi-page PDF, submitted to the API using the BatchRead C# SDK methods still causes a failure. See my forked repo here:

https://github.com/Cpcrook/azure-sdk-for-net/commit/532ec7b4d5e27bac856f4aed4e9532044eeca206#diff-e14906034a70263b65b3edcef3507da9L38

The underlying objects are strongly typed integer properties, causing a failure on deserialization. My hack-fix is changing those to dynamics to handle the fractional inches returned from PDFs. Objects would also suffice.

@tchristiani please re-open -- pointing to the documentation that is incompatible with your SDK is not a resolution - the method is still broken in the C# SDK due to the strongly-typed properties I described and linked to above.