Azure-Samples / cognitive-services-quickstart-code

Code Examples used by the Quickstarts in the Cognitive Services Documentation
MIT License
351 stars 518 forks source link

OCR API doesn't return correct bounding box values. #260

Closed DavidLafond closed 3 years ago

DavidLafond commented 3 years ago

Please provide us with the following information:

This issue is for a: (mark with an x)

- [X] bug report -> please search issues before submitting
- [ ] feature request
- [ ] documentation issue or request
- [ ] regression (a behavior that used to work and stopped in a new release)

Minimal steps to reproduce

When using the Read API with an image that is not in the right orientation, the READ API correctly returns the bounding box coordinates within the same orientation as the source image. BUT, when using the OCR API, the image is rotated in the correct orientation before the OCR resulting in bounding box coordinates not matching the source image. Our AI algorithm needs to match the bounding boxes to the OCR bounding boxes. In READ API it's working but not OCR API. Applying a rotation matrix to the image coordinate also doesn't match the OCR API extracted coordinates.

Any log messages given by the failure

No

Expected/desired behavior

Same orientation coordinate as the input image.

OS and Version?

Windows 7, 8 or 10. Linux (which distribution). macOS (Yosemite? El Capitan? Sierra?) Any OS

Versions

Mention any other details that might be useful


Thanks! We'll be in touch soon.

fsharpn00b commented 3 years ago

Hello @DavidLafond ,

Thank you for letting us know of this issue. I raised it to the service team. Their reply was that the OCR API (that is, the recognize_printed_text/recognize_printed_text_in_stream methods) is being deprecated in favor of the Read API (read/read_in_stream). The documentation is in the process of being updated to reflect this (that is, "OCR" will point to the Read API).

I'm not sure what is happening with the functionality to correct image orientation in the deprecated methods, so I asked about that. I will let you know what I hear.

fsharpn00b commented 3 years ago

Hello @DavidLafond ,

I heard again from the service team - it looks like changing the image orientation will not be in the Read API. I will now close this issue, but please feel free to reopen it if I can answer any other questions. Thank you! fsharpn00b

DavidLafond commented 3 years ago

Awesome! One last question that is not clear about the read API if you can help me. I have read that the maximum file size is 10,240 x 10,240 Maximum file size of 100MB Maximum of 10megapixels

Can you officially confirm those numbers please, they may be old information and I really need to have the right values :) Thanks

fsharpn00b commented 3 years ago

Hello @DavidLafond,

I sincerely apologize for not replying to this sooner as I was not aware of your reply. I have passed your question on to the service team.

According to this page: https://docs.microsoft.com/en-us/azure/cognitive-services/computer-vision/overview-ocr the maximum values are file size 50 MB (6 MB for the free tier) and dimensions at least 50 x 50 pixels and at most 10000 x 10000 pixels.

I will let you know what I hear from the service team and I will update that page if needed. If you get a chance, can you tell me where you read about the values you mentioned? If those are from one of our doc pages, it might need updating as well.

Thank you! fsharpn00b

fsharpn00b commented 3 years ago

Reopening until new question is answered.

fsharpn00b commented 3 years ago

Hello @DavidLafond ,

I apologize for the slow reply. I just heard from the service team and they confirm the image requirements provided on https://docs.microsoft.com/en-us/azure/cognitive-services/computer-vision/overview-ocr#input-requirements are correct: · Supported file formats: JPEG, PNG, BMP, PDF, and TIFF · For PDF and TIFF files, up to 2000 pages (only first two pages for the free tier) are processed. · The file size must be less than 50 MB (6 MB for the free tier) and dimensions at least 50 x 50 pixels and at most 10000 x 10000 pixels.

I will close this issue again for now but please feel to reopen if you have other questions. Thank you!