Shared-Reality-Lab / IMAGE-server

IMAGE project server components
Other
2 stars 7 forks source link

Investigate Azure computer vision 4.0 preview #528

Open jeffbl opened 1 year ago

jeffbl commented 1 year ago

Azure has a 4.0 of their computer vision (including tagging, OCR, etc.) in public preview. I don't know when this will become the default?

@rianadutta it will be interesting to see how things change, which we should be able to find using the test scripts!

@dafsgit ping since may impact OCR...

@rohanakut the blurb on the MS website implies there might be a more unified API. Do you know if this will impact our use of azure, i.e., will we need to change our preprocessors to take advantage of any of the new functionality? https://learn.microsoft.com/en-us/azure/cognitive-services/computer-vision/whats-new

@rohanakut it looks like they are requiring applications for more and more of their computer vision functionality, e.g. celebrity recognition. Do you think we should be applying for access to any of these? https://learn.microsoft.com/en-us/legal/cognitive-services/computer-vision/limited-acc[…]%2Fcognitive-services%2Fcomputer-vision%2Fcontext%2Fcontext

dafsgit commented 1 year ago

I checked and indeed the OCR results are different. Now the json API response gives everything separated:

The only additional information are the spans and the style array. The one we are using (v3.2) gives just the array of lines and inside each line, the style variable (in which "handwriting" is a value) and the array of words with the confidence.

[UPDATE] I ran some tests with the images and Riana's scripts for the preview of v4. The text recognition is definitely better, it reads less noise and in some cases where it didn't recognize anything, it does now. I believe gradual improvement has been made, I compared the new outputs with a couple past tests and noticed that even before they're continuously changing for the better.

rohanakut commented 1 year ago

@jeffbl The new API has a new output format. The documentation mentions that their output format would change in 4.0 version.

They haven't mentioned that they are deprecating V3.0, so I don't think we need to make any changes to the code right now. As far as I know, Azure still supports V2.0 as well, so I believe that we would have the support for v3.0 in the forceable future.

rohanakut commented 1 year ago

After discussing, it was decided that we would request Azure for special access to their emotion recognition API, since we fall under the assistive technology criteria. Reassigning this to myself to ensure the application is done in a timely manner

rohanakut commented 1 year ago

@jeffbl do you have the link to the form? I am not able to find the link that you had mentioned in the meeting

jeffbl commented 1 year ago

It is linked in the last link of the original issue above: https://aka.ms/facerecognition

Different pages seem to say different things, but hopefully this is the right starting point....

jeffbl commented 1 year ago

Note there are apparently new captioning ML tools coming online soon, for generating better alt-text: https://techcrunch.com/2023/03/07/microsofts-computer-vision-model-will-generate-alt-text-for-reddit-images/

JRegimbal commented 1 year ago

It seems that several features we use are only listed for version 3.2...is this likely to cause a problem for us?

jeffbl commented 12 months ago

@rohanakut Do you have any documentation or response from the request you made to Azure?

jeffbl commented 11 months ago

@jaydeepsingh25 I'm moving this to backlog. We should definitely investigate upgrading to newer Azure versions, on the assumption that functionality should improve, but this is not high priority vs. the 2diy work.