Implement Azure-based content tagging scene description preprocessor

gp1702 commented 2 years ago

The scene recognition model was removed due to problematic results (for more info see #167. An alternative to scene recognition is a scene description model. We can decide to implement a separate preprocessor for this after careful discussion with the UX team.

Cybernide commented 2 years ago

That might be good. That'll come down to testing and details, though. Are we talking about this?: https://docs.microsoft.com/en-us/azure/cognitive-services/computer-vision/concept-describing-images

Cybernide commented 2 years ago

Assigning @Sabrina-Knappe for tracking purposes as this might be good for your parallel thesis work

Cybernide commented 2 years ago

Determined today in meeting with @gp1702 that scene description might not be the best route but we're looking at content tags as an alternative.

Cybernide commented 2 years ago

@gp1702 What's the current status on obtaining content tags? Making you the assignee until we know what we're working with, then feel free to add me back

Cybernide commented 2 years ago

Changed the title to reflect the current state of work here. In light of the fact that an overview of a photograph is going to be helpful to our users, @gp1702 and I believe that this will be a good alternative to the "scene recognizer" preprocessor which was prone to spitting out erroneous or problematic output.

Tasks include

[x] @gp1702 to consult with @JRegimbal on how it would fit in with current architecture
[x] Find out what content tags Azure provides @gp1702 (as of last conversation, may need to get Jeremy to create a ticket with Microsoft)
[ ] Figure out which tags would be most useful from UX standpoint @Cybernide @Sabrina-Knappe
[ ] Combine that with confidence measures to create full-photo context description @Cybernide @gp1702 (as per our last conversation, this would involve filtering for tags that are most useful, described in task above)

Closing #225 as the contents of that issue are now within the scope of this one.

gp1702 commented 2 years ago

Assigning this to @Cybernide to get information about most relevant tags for the users.

Cybernide commented 2 years ago

Thanks, @gp1702. I've got the tag retrieval script running on Colab and will get back to you shortly about the tags

rohanakut commented 2 years ago

@jeffbl could this issue be moved from backlog as this would be a part of my thesis. For clarification: The Azure tags mentioned in link would be used to create specialized ML based models. For more details on the implementation please refer to #443

jeffbl commented 2 years ago

Assigning to @rohanakut at his request, to generate new preprocessor that outputs these for his thesis work described in #443.

rohanakut commented 2 years ago

@jeffbl this issue can be moved to a future sprint. This issue would only be tackled once #455 is closed.

rohanakut commented 2 years ago

@jeffbl This has been successfully integrated into my work. However, there are a few cases where this does not seem to work. Could you let me know if this issue needs to be closed or reassigned?

jeffbl commented 2 years ago

This work item was originally around tagging the photo as a whole, in the sense of just indicating what it was about, even without the further refinement of additional celebrity, etc. detectors. Question: does the current preprocessor for this output the raw tags, along with all of the other refinements? I am looking at the flowchart you'd put in slack (which should probably be in github somewhere, maybe as part of the readme for the eventual handler that does the consolidation?) and it isn't clear what is being output earlier in the chain, if anything. For example: If we don't end up putting the emotion recognizer into production, can we still get the raw azure tags to create other summaries? If easier to discuss, just ping me.

rohanakut commented 2 years ago

@jeffbl I am closing this issue as #500 would be used to track the progress on azure capabilities

Shared-Reality-Lab / IMAGE-server

Implement Azure-based content tagging scene description preprocessor #220