Open dobkeratops opened 6 years ago
Wow, very cool - thanks a lot for sharing!
Personally, I would love to see ImageMonkey going into that direction. I think that could also be something unique that would set us apart from other services (like LabelMe). I also wouldn't mind implementing different UIs for different use cases (I think in the end it would basically just be an extension to the available annotation tools...maybe with small adjustments based on the selected mode)
At the moment, the biggest unknowns for me are:
Where do we get those images? If possible, I would really like to stick to CC0 (public domain) images.
Can we find a usecase that's simple enough, so that the majority could contribute something to? While I really like the "car/motorbike mechanic" use case, I think it's quite specific. In order to contribute something you need to be passionate about image labeling/annotating and you need to know something about car/motorbike mechanics.
What's on my mind for quite some time: Can we use image recognition to help blind people using their smartphones? I think there are ways to make a smartphone app more accessible to blind people, but in order to do so you often need to change the sourcecode of the app (e,q; label the button properly, so that a screenreader can read it out loud). I guess, there are quite a few apps out there, that either aren't coded with that principles in mind or that are not maintained anymore. I am wondering, if there is a way how image recognition could help here? If we could find a way, I think that would be a topic where everybody could contribute something. Just take a screenshot of your favourite app and label the buttons/screens.
Not sure if that's somehow possible on android/iOs, but what I am imagening is a global system service on the phone which intercepts all the touchscreen presses. The service takes a screenshot and then feeds the image into an (neural net based?) image segmentation service which tries to determine which button/control was pressed and then reads the button's label out loud.
Another possibility would be to do it the other way round and let the user talk with the global system service. "Open app xy and do yz..". The service uses again image recognition to find the appropriate controls on the screen and then initiates touch presses to perform the action.
I really don't know if that is even remotely possible...but just want to mention it :)
" Can we use image recognition to help blind people using their smartphones?"
of course recognition would also be great in the real world as a 'digital guide dog' , I think i've seen demos a bit like that - but you appear to be talking about recognising the UI. Perhaps that is easier solved in app development (where all the information to generate the UI is available) ; I guess voice recognition will help with that too (circumvent the need for a button)
Where do we get those images? If possible, I would really like to stick to CC0 (public domain) images.
agreed about CC0 priority, and indeed with complex repairs there's a risk some might be prioprietary. on the other hand, look at the amount of DIY guides on youtube; (this kind of use case could almost be 'DIY guides on steroids') maybe companies would even encourage dissemination of the knowledge to use their products well
Perhaps that is easier solved in app development (where all the information to generate the UI is available) ; I guess voice recognition will help with that too (circumvent the need for a button)
Totally agreed, solving that directly in the app is for sure the best approach. I haven't talked to a blind person, but what I've read so far on the internet is, that although the operating system (e.q: iOs) supports it, not that many app developers design their apps in a way that blind/visually impaired people can use it. I guess that's most probably because it's a niche sector...as there aren't that many blind people it's probably not worth it for app developers to adapt their apps.
I think a use case that serves society could maybe be a good way to get some traction. Not sure though, if this is the best use case. One of my concerns was, that a neural net might not be the best fit, as we would need a net that can differentiate between a lot of (slightly different) UIs. But there seems to be also neural nets that can detect brand logos (https://github.com/satojkovic/DeepLogo), which I guess has similiar requirements (differentiate lots of slightly different visual representations of logos).
agreed about CC0 priority, and indeed with complex repairs there's a risk some might be prioprietary. on the other hand, look at the amount of DIY guides on youtube; (this kind of use case could almost be 'DIY guides on steroids') maybe companies would even encourage dissemination of the knowledge to use their products well
Totally agreed - that would be really awesome! I would set the treshold to 1000. If we find at least 1000 public domain images of something, I think we could consider that as a potential use case. I think the 1000 is a good number which helps us to evaluate whether it's easy to find resources or not (I guess at the beginning we need upload all the images ourselves. As soon as we reach a decent amount of images, I think we could reach out to reddit & similar sites and ask the community for help :))
(tangentially... whilst blind-support might be something many app developers overlook, the 'hands-free' voice interface use case is mainstream use-case that might still drive improvements to their experience. I've always wanted (for example) a hands-free , voice/tactile interface to exploring podcasts i.e. whilst out walking or riding a bike. )
elsewhere talking about AR/VR, a use for mixing that with vision tech appeared in conversation
I guess you'll have seen the 'car/motorbike mechanic examples' in AR (hololens) .. IMO thing along those lines would be the killer app for AR..
https://www.youtube.com/watch?v=TEaS0yMmP74
.. could you consider the vision side of that for labelling ..
what kind of images / use cases are there (self assembly of furniture? dismantling / re-assembling various objects - labelling the components ?)
what labels could help (components,tools?)
are there any additional types of markup (eg arrows/paths , indicating of alignment/rotation) that might make sense (would any of these be general purpose enough to consider in an application like ImageMonkey .. perhaps you'd need several use cases to justify the complexity)
would any of these considerations overlap with extensions for components, or the material/properties system (a states could be represented as properties?.. broken, clean/dirty , open/closed, pressed etc)
https://www.youtube.com/watch?v=2DDjZiJuEEg