Open jeromeetienne opened 7 years ago
I had a similar conversation in the past with @salamanders and, despite a confidence factor being a good idea on paper, it has a few complications when it comes to the implementation: it would be almost impossible to enforce consistency across detectors or platforms and even across implementations on the same platform. The same situation is reflected in the venerable OpenCV implementation e.g. (here), where there is no confidence factor per se -- it can be proxied by trying several detections with varying minSize
/maxSize
but that'd be far from the purpose of the Web API.
WDYT?
I'm less worried about cross comparison, and more about "was this a good or bad OCR capture, and should I ask the user to try again.". So even very rough "yay / meh / boo" ranking of the confidence would be great.
Where should I read up on expectations of implementers regarding speed, accuracy, confidence, models to use, etc? I'd like to know how this compares as an API to things like:
https://justadudewhohacks.github.io/face-api.js/docs/index.html https://github.com/auduno/headtrackr
My experience with the two is the simpler headtrackr API runs incredibly fast with its simpler logic, but does not provide landmarks, confidence, etc. The heavier facedetect lib gives you much more, but it is painfully slow and uses a ton of compute likely due to the off the shelf models used when doing detections with the generic TensorFlow library. My assumption is the sweet spot for this API wouldn't be to replace headtrackr with a web interface, but to modernize and optimize the rich functionality of the facedetect library so it ran as smoothly as headtrackr.
More on detection logic written a decade ago running on much slower hardware and software here: https://liuliu.me/eyes/javascript-face-detection-explained/
It's interesting to note that 10-11 years ago, someone in Opera's dev team put this together using service workers and CPU code and got it to run fast enough for an impressive demo. 10 years later, the web is still trying to hammer out how to do this efficiently and all we have to show for it so far is bloaty js libraries and a draft spec. Not throwing shade as much as I'm throwing praise at those guys for being ahead of their time.
I wonder if this wold reveal too much (private) information about the underlying system. Like, if I tested the a range of images across a range of hardware, then it might reveal a unique signature. At the very least this would need some noise or it would need a confidence of "low", "high" and that's it. (I guess similar to @salamanders' comment).
At same time, the concerns would arise if given a set of images they would reveal something about the underlying system... so there want to be some pretty clear uses for this.
I wonder if this wold reveal too much (private) information about the underlying system. Like, if I tested the a range of images across a range of hardware, then it might reveal a unique signature.
In considering the potential for this API to be used as a fingerprinting vector I have been assuming that the behavior of a model is deterministic (not affected by hardware differences) and so the only detectable signal is which version of a model the user agent is using, which for a platform-provided detection library is equivalent to the OS version.
I havn't found any way to estimate the accuracy of the result provided by the API. It may be interesting for the application to have a confidence factor from the underlying technology.
the browser tell me there is a face here, but is it sure at 50% or 99% ? this may be useful for the application to know this information.
usecase 1: for example the app previously detected a face close to this location in a previous frame, so if the new detection is 50%, it is good enougth.
usecase 2: the application is doing a initial detection wihtout previous knowledge. So the API has to be real sure. So, for example, the app would require a 90% confidence.