Detect joke/unwanted pictures

nicolas-raoul commented 8 years ago

Like the traditional Commons upload, this app is subject to "vandalism", which in this case is not destructive, but uploading pictures that are not meant to illustrate wiki sites, but only as inside-jokes, to repulse viewers, or as a result of total boredom.

Non-exhaustive list of subjects that might not be wanted:

Selfies of non-notable people
Some body parts (Commons: "Images depicting male nudity are regularly nominated for deletion")
Totally black or extremely fuzzy pictures

The difficult thing is to save Commons volunteers' time while not establishing our own censorship either. Maybe "tagging" such pictures with a particular category?

Thanks Justin for the idea!

misaochan commented 8 years ago

How would the detection work? 3rd party image recognition software?

nicolas-raoul commented 8 years ago

To stay compliant with the Wikimedia privacy policy, we can't use an external API, so probably some open source image recognition library. https://www.tensorflow.org might help.

nicolas-raoul commented 8 years ago

http://softwarerecs.stackexchange.com/questions/30075/java-library-to-detect-whether-picture-is-selfie-or-not

justinormont commented 8 years ago

OpenCV was great a decade ago and is still developed. I'm unsure of the current state of the art.

Face detection: https://blog.openshift.com/day-12-opencv-face-detection-for-java-developers/

Blur detection: Many great photos have blurry backgrounds (often DSLR shots), so you may get false positives. Perhaps ignoring large aperture size, as read from photo metadata. Phone apps also emulate the depth of field in software, often by taking multiple images and calculating depth of each image segment and artificially blurring the background.

http://www.pyimagesearch.com/2015/09/07/blur-detection-with-opencv/ http://answers.opencv.org/question/16927/detect-if-image-is-blurry/

Another method of detecting blur is calculating the FFT of the image and measuring the ratio of the high frequencies to low frequencies.

nicolas-raoul commented 8 years ago

About blur: The thing would be to recognize whether the picture is entirely blurred, with no sharp part at all. FFT would probably work great indeed, except if noise (due to bad camera quality) introduces high frequencies.

Bluesir9 commented 6 years ago

I understand that this is a fairly old discussion but I am curious. Processing the image on the device for face detection, blur detection etc. would be a fairly heavy operation right? I remember working on a small app a while back where we were advised to move all image processing related stuff to a server for exactly that reason.

nicolas-raoul commented 6 years ago

@Bluesir9 Yes image processing can be pretty heavy, but I believe that some of the checks can be implemented in a way that does not use too much power. For instance, checking whether a picture is totally black is very fast, checking whether it is totally fuzzy should be reasonably fast.

Incidentally, one of my friends is working on a embedded Java library for Android that can recognizes cars/people/etc quite fast (not open source though).

Bluesir9 commented 6 years ago

Makes sense. I will refer the links that have been mentioned earlier in this discussion and give this a shot. Will start with black and fuzzy pictures first and see where it goes from there. How does that sound?

nicolas-raoul commented 6 years ago

Yes starting with black pictures detection sounds good (pictures taken in the pocket for instance). Thanks! :-)

nicolas-raoul commented 6 years ago

When such a black picture is detected, I suggest asking the uploader something like "This picture is totally black. Are you sure you want to upload it? Wikimedia Commons is only for picture with encyclopedic value".

I suggest checking the picture before uploading it. The upload is done in the background anyway, while people enter metadata, so taking a few more seconds is not a problem.

Bluesir9 commented 6 years ago

Right. The flow changes should be simple enough, the actual detection would be the crucial part. Will get back here if I have any queries related to this.

Bluesir9 commented 6 years ago

@nicolas-raoul An update on this. I am done with the detection of black/too-dark images. It works well enough but I will keep testing it on my end just to be sure.

I decided to move to detection of blurry images. So I added the OpenCV library to the project and found that the multidex limit was reached(specifically 67163) and I had to enable multidexing to be able to build following that.

Wasn't sure if I should keep going further so here I am.

nicolas-raoul commented 6 years ago

@Bluesir9 Thanks for the update! Looking forward to test superdark picture detection.

Multidexing does not seem to be a problem per se: https://stackoverflow.com/questions/39503348/disadvantages-in-multidexing-the-android-application

By the way, what is the APK size after adding OpenCV?

Bluesir9 commented 6 years ago

The beta debug build's size is 21.7 MB.

whym commented 6 years ago

Discussion on possibility of creating a microservice, from #926

whym commented an hour ago

Another idea might be to split the detection module into a web service, for example on tools.wmflabs.org. Your Java code should work on the server side JVM with little to no changes, at least in principle. It will not work when the phone is offline, though.

nicolas-raoul commented an hour ago

A web service would only work after upload, though, so probably after the user has entered a title/description.

whym commented 43 minutes ago

That's one way to do it, but the app can also send the image (or maybe a thumbnail of the image to save the bandwidth) to the detection web service directly, right? This can happen before uploading it to Commons.

nicolas-raoul commented 29 minutes ago

That would take double the bandwidth, right?

By the way, I think we should discuss all of this on the issue rather than on the PR :-)

So here we are. :)

sivaraam commented 6 years ago

I'm not sure how much useful this might be but I'll tell it anyway. IIRC, the Wikipedia android app uses face recognition to correctly position lead images which have faces in them. I think they also use the primary colour of the lead image of an article to give a colour to the "Continue reading" and "Because you read" cards shown in the app's Explore feed. So, they should have used an image manipulation library. I'm not sure about the library they use for this but I don't think they use OpenCV. I guess that library would possibly be useful for this. The app devs would possibly tell us if we ask them.

misaochan commented 6 years ago

So I did a bit of digging and found... https://softwarerecs.stackexchange.com/questions/30075/java-library-to-detect-whether-picture-is-selfie-or-not , which was started by Nicolas in 2016, lol. I also found a couple of open source libraries for detecting faces (edit: whoops, the library I linked is unsuitable. we have to keep looking, or check what Wikipedia app uses as per @sivaraam 's post) . There is also an interesting camera API that detects and prevents the use of the front-facing camera - https://medium.com/google-developers/detecting-camera-features-with-camera2-61675bb7d1bf , but of course that will only work if the user is using the in-app camera button.

Despite @nicolas-raoul 's thread not receiving any answers, someone made a good point about selfie detection kits showing up false positives for proper photos of notable individuals, too. But I believe we now have a couple of ways to get around this:

We now can check revert rates of a particular user. So we can consider only allowing uploads of faces by users who have a not-too-terrible revert rate (less than 20% perhaps).
For all users, it is still helpful to detect and warn them about selfies if we catch a photo that is predominantly a face. Similar to the dark detection PR, we can allow them to proceed after the warning, until we manage to fine-tune this

Thoughts?

nicolas-raoul commented 6 years ago

Good idea! How about this?

Measure what portion of the picture the face or head takes (area).
IF the face takes more than 20% of the picture OR (the face takes more than 10% of the picture AND the user has a high revert rate), then show the warning.
Let everyone upload anyway.

That requires a library that gives us a rectangle around the face, not just the coordinates of the center of the face, nor just a face-or-not boolean.

The percentages are only a guess, and should be tuned during testing.

misaochan commented 6 years ago

Sounds good to me. We can go with just warnings first, and only consider upload restrictions much later when we have an idea of how reliable/accurate the warnings are.

misaochan commented 6 years ago

Thoughts on this library? https://github.com/Qualeams/Android-Face-Recognition-with-Deep-Learning-Library

nicolas-raoul commented 6 years ago

To my understanding, this library seems to do face recognition (is this a picture of Josephine or Nicolas?) whereas we need a library that answers: Is there a face in that picture and if yes in what rectangle?

On Wed, Aug 22, 2018, 22:29 Josephine Lim notifications@github.com wrote:

Thoughts on this library? https://github.com/Qualeams/Android-Face-Recognition-with-Deep-Learning-Library

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/commons-app/apps-android-commons/issues/74#issuecomment-415031679, or mute the thread https://github.com/notifications/unsubscribe-auth/AAGFBmKj19J8N_Yv4RHpNiHMaqIiMi7fks5uTVzAgaJpZM4HiYcz .

nicolas-raoul commented 5 years ago

Here is a JavaScript example that detects a face and gives its dimension, it is very fast (on a desktop PC it is pretty much real-time): https://storage.googleapis.com/tfjs-models/demos/posenet/camera.html It uses TensorFlow, which means that with a bit of work it can be used on Android as well.

nicolas-raoul commented 5 years ago

Using machine learning, I created this code that detects whether a picture is a selfie or not with 85% accuracy: https://github.com/nicolas-raoul/selfie-or-not

Anyone feel free to integrate it in the app, but please make sure that it takes less than 500 kilobytes, by slimming down the Android Tensorflow binaries to just what is needed.

misaochan commented 5 years ago

Talked to @nicolas-raoul and @dbrant about this - the selfie-or-not library is not currently ready for Android use, but we can use the FaceDetector API in the Android SDK (not the one with Google Play Services).

ilgazer commented 5 years ago

Do we want to prevent people from submitting normal portraits? If we don't, how could we distinguish between a portrait and a selfie using the FaceDetector API?

nicolas-raoul commented 5 years ago

@ilgazer The goal is never to prevent people from uploading, but rather showing a warning. I can make the difference between a selfie and a portrait. For the instance the size and proportions of the face, the proximity of a shoulder or an arm extended towards the camera, are sure signs that a picture is a selfie rather than a portrait. I expect a computer might be able to tell the difference as quickly as me if we find the right library/API :-)

nicolas-raoul commented 3 years ago

Face detection libraries for Android, just in case FaceDetector does not fit for some reason: https://github.com/vcvycy/MTCNN4Android https://github.com/syaringan357/Android-MobileFaceNet-MTCNN-FaceAntiSpoofing

commons-app / apps-android-commons

Detect joke/unwanted pictures #74