Update the Azure APIs to their latest versions

dkotter commented 1 year ago

Is your enhancement related to a problem? Please describe.

Ideally we should be looking to update any APIs we use to their latest versions on a regular basis. This issue is focused on any Azure APIs we use. The following is a list of the APIs we are using and the version.

Analyze Image v3.0
OCR v3.2
Read v3.2
Generate Thumbnail v3.1
Personalizer v1.0
TTS cognitiveservices/v1

For the Personalizer API, v1.0 is the latest (though there is a v1.1 in preview) so nothing needed there. Same for our Text to Speech API, we are currently using the latest version.

The Analyze Image, OCR, Read and Generate Thumbnail APIs are all under the same service (previously known as Cognitive Services Computer Vision, since renamed to Azure AI Vision). The latest released version of this API is v3.2, while there is a v4.0 public preview API.

Azure is pushing for everyone to use the new v4.0 public preview API but in researching this, there are currently some limitations that may hold us back. For instance, generating image captions or smart cropping are only available in a small set of regions in v4.0 (East US, France Central, Korea Central, North Europe, Southeast Asia, West Europe, and West US, East Asia).

There's also been quite a few changes to these APIs in v4.0, so will take some refactoring if we pursue these updates. For instance, all existing features we use, outside of reading content from PDFs, is now under a single Analyze API in v4.0. This will require some changes to how our code works to account for this.

That said, assuming we're okay with the region limitations, I'd like to pursue updating all of those to v4.0. If we're not okay with that, I think it would be ideal to get all of those on v3.2 (so just Analyze Image and Generate Thumbnail).

I tried updating to v3.2 of the Analyze Image API and while the results we get seem good, the confidence scores, at least for image captions, are lower, so that's something we would need to determine how best to handle (in using the Vision Studio tool, this seems to have been fixed in v4.0). Their docs even mention:

In general, we advise a confidence threshold of 0.4 for the Image Analysis 3.2 API and of 0.0 for the Image Analysis 4.0 API (preview).

If we decide to update to v4.0, here's tasks as I see them:

[ ] Update Analyze Image API to v4.0 and address any issues there. I believe we'll need to update how we send data and how we parse the received response
[ ] Update how we handle OCR to use this new API
[ ] Update how we handle generating thumbnails to use this new API
[ ] Investigate the Read API. It seems like this functionality moved to a new API (Document Intelligence). We should investigate what it would mean to use that API instead. We may find it's not worth the effort and we leave this on the current v3.2 API

If we stick with v3.2, here's what we'll want to do:

[ ] Update the Analyze Image API to v3.2 and modify how we handle error responses (this changed in v3.2).
[ ] Update how we deal with confidence scores to account for lower scores in v3.2
[ ] Update the Generate Thumbnail API to v3.2 and address any issues there

Designs

No response

Describe alternatives you've considered

No response

Code of Conduct

[X] I agree to follow this project's Code of Conduct

kmgalanakis commented 1 year ago

@jeffpaul what should be our decision here? Move to v4.0 or stick to v3.2?

cc @dkotter

kmgalanakis commented 1 year ago

I've created a draft PR for this at https://github.com/10up/classifai/pull/559.

I verified that the confidence scores have been lowered. Judging by the tests I did what worked best for me was a score between 0.5 and 0.55. As far as the lowering of the confidence scores is concerned, I mostly see it as a matter of personal preference.

As a consequence, I would suggest that we leave the default option value for the scores as is and display a dismissable notification when we detect that an API version greater or equal to 3.2 and the selected confidence threshold is above 0.5-0.55.

I tried to create another PR with the update of the APIs to version 4.0 but I found it too difficult, considering the fact that I'm not that familiar with the codebase, and since from what I saw the endpoints have changed.

jeffpaul commented 1 year ago

I received an email from Microsoft Azure that Computer Vision 3.1 API will be retired on 13 September 2026 and to migrate our computer vision workloads to Computer Vision 3.2 API with these benefits:

Improved image captioning, image tagging and object detection
164 language support for OCR including handwritten support for 9 Languages: English, Simplified Chinese, French, German, Italian, Japanese, Korean, Portuguese, and Spanish
Up-to-date documentation and better customer support

Seems like we're well along on that path, but best that we continue to stay on top of the APIs we're using in ClassifAI to ensure we're more regularly updating the API versions in ClassifAI to stay as current as feasibly possible.

10up / classifai