Strange behavior of Demo Classifier

jflevi commented 8 years ago

Hi,

Let me share with you a strange behavior of the Demo Classifier in your application.

Select Dog Breeds and All categories (if you don't select all results are different and it works fine: no match). Then classify the following image https://i.ytimg.com/vi/8o44asJt8ZA/maxresdefault.jpg

The result classifies this image as a Dalmation!!! error

Strange results with https://tse1.mm.bing.net/th?id=OIP.Ma5dddfee9cd0a3e59653cfb1a7547606o0&pid=15.1&P=0&w=300&h=81

Similar results with other images which are not dogs. Similar results with other categories: Moleskine and Satellite (but different with Insurance Claim)

kognate commented 8 years ago

The watson image used was from https://i.ytimg.com/vi/8o44asJt8ZA/maxresdefault.jpg

jflevi commented 8 years ago

Yes... Any image which is not a dog provides similar results. Thanks JF

nfriedly commented 8 years ago

Hi @jflevi, thanks for the feedback.

This is due to the relatively small training image set that the demo uses. In particular, the Non-Dogs are all 4-legged critters like cats and tigers and such, so when you include that in the training data, the service tries to match whatever random image you give it to either a particular dog breed or else cats and such.

If you wanted to recognize arbitrary images, you’re going to need either create a much larger training set (with the Non-Dogs part full of random things like company logos), or else you could just use the default classifier that the "Try” page of the demo uses.

That said, we do appreciate the feedback and we’re continually working on improving the service, so we’ll take this into account for future updates.

jflevi commented 8 years ago

Nathan,

Thanks for your response but I don't think adding more images will help in my use-case which is the following (adapted to dogs).

I have a set of images with unknown content. I want to identify which one are Dogs which are not Dogs.

The real use-case I'm trying to address with this service is: The Bank as a set of customer documents (70 millions documents) stored and they want to classify and identify ID documents (passport or IDCard) other document don't need to be classified. The default classifier is not able to recognize any ID cards.

Would you have any recommendation to address my requirements?

Thanks

Cordialement - Kind regards

Jean-Francois LEVI Client Technical Advisor - Société Générale Phone/Fax: +(33) 1 58 75 28 77 Mobile: +(33) 6 75 07 85 00 Email: levi.j@fr.ibm.com

From: Nathan Friedly notifications@github.com To: watson-developer-cloud/visual-recognition-nodejs visual-recognition-nodejs@noreply.github.com Cc: Jean-Francois Levi/France/IBM@IBMFR, Mention mention@noreply.github.com Date: 02/06/2016 22:05 Subject: Re: [watson-developer-cloud/visual-recognition-nodejs] Strange behavior of Demo Classifier (#104)

Hi @jflevi, thanks for the feedback. This is due to the relatively small training image set that the demo uses. In particular, the Non-Dogs are all 4-legged critters like cats and tigers and such, so when you include that in the training data, the service tries to match whatever random image you give it to either a particular dog breed or else cats and such. If you wanted to recognize arbitrary images, you?re going to need either create a much larger training set (with the Non-Dogs part full of random things like company logos), or else you could just use the default classifier that the "Try? page of the demo uses. That said, we do appreciate the feedback and we?re continually working on improving the service, so we?ll take this into account for future updates. ? You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub, or mute the thread.

nfriedly commented 8 years ago

That actually sounds like a very fitting use case - choose a random selection of images out of the 7 million and split them into two groups: ID cards & other. Per the documentation, you'll want at least 150-200 images in each group, and should see some benefit all the way up to 5000 images total. (You'll likely have to shrink down the images to fit within the 100mb-per-zip limit, but 320px and larger is good.)

Beyond that, you can also require human verification for images with a score below a given threshold, say 0.75 - and then perhaps add those images to the training set so that it further improves over time. (note: each classifier instance is immutable, so you can't add new training images to an existing one... But you can just replace it with a new one.)

jflevi commented 8 years ago

Nathan,

Thanks this is exactly what's I'm currently testing and it works fine...except that images which are not IDcards (like logos) are sometime being classified as IDcards. So I tested with the Dogs example and found the same problem. For me there is a bug somewhere... try the following with Dogs...

Select 3 dogs breeds only and try to classify the Watson logo or any other image... The result is no match which is fine this is the expected result.

If you Select all the Watson logo is being wrongly classified.

That's my point and I don't understand why it behaves like this.

Thanks a lot for your feedback.

JF

Cordialement - Kind regards

Jean-Francois LEVI Client Technical Advisor - Société Générale Phone/Fax: +(33) 1 58 75 28 77 Mobile: +(33) 6 75 07 85 00 Email: levi.j@fr.ibm.com

From: Nathan Friedly notifications@github.com To: watson-developer-cloud/visual-recognition-nodejs visual-recognition-nodejs@noreply.github.com Cc: Jean-Francois Levi/France/IBM@IBMFR, Mention mention@noreply.github.com Date: 03/06/2016 14:44 Subject: Re: [watson-developer-cloud/visual-recognition-nodejs] Strange behavior of Demo Classifier (#104)

That actually sounds like a very fitting use case - choose a random selection of images out of the 7 million and split them into two groups: ID cards & other. Per the documentation, you'll want at least 150-200 images in each group, and should see some benefit all the way up to 5000 images total. (You'll likely have to shrink down the images to fit within the 100mb-per-zip limit, but 320px and larger is good.) Beyond that, you can also require human verification for images with a score below a given threshold, say 0.75 - and then perhaps add those images to the training set and created a new classifier so that it further improves over time. (Each custom classifier is immutable, so you can't add new training images to an existing one... But you can just replace it with a new one.) ? You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub, or mute the thread.

kognate commented 8 years ago

The situation with the watson logo being scored can be corrected in a custom classifier by adding the image to the negative class.

So, lets say I have 200 images of documents I want to classify for each document type, like drivers license, passport, and tax id card. In addition to these three classes, I include 100 images of logos and things that are not any of the three classes I want to find. When testing my classifier, if I find images that classify incorrectly, I can add them to the negative classifier to improve the results.

The reason the watson logo gets classified is that it must have some feature in it that can be found in the dalmatian class. If I had to guess, it's the dots in the image. I would also say that the 0.522 score is pretty low, and increasing the threshold will help weed out poorly classified images.

matt-ny commented 8 years ago

Jean-Francois wrote:

For me there is a bug somewhere... try the following with Dogs... Select 3 dogs breeds only and try to classify the Watson logo or any other image... The result is no match which is fine this is the expected result. If you Select all the Watson logo is being wrongly classified. That's my point and I don't understand why it behaves like this.

I understand that is counter-intuitive. In your example, adding more training data (all dogs instead of just 3 breeds) leads to a misclassification. One of the deep problems with some machine learning techniques (including the ones we use) is that we cannot explain "why" a mistake (or right answer) was given. You might find this paper interesting and surprising: https://arxiv.org/abs/1312.6199

We do know that counter-intuitive results like this are possible, especially when the test images (Watson logo) come from a different distribution than the training images (dogs). We are actively working on ways to identify whether a classifier is "appropriate" for a particular test set, given what it was trained on. Our dept has some initial results, but nothing deployed yet.

@kognate 's advice above, about using large training sets (hundreds or thousands of example per class) and augmenting your negative set with the logo images is the best practice we can recommend at this time.

Matt Sr Software Engineer IBM Research - Visual Recognition

germanattanasio / visual-recognition-nodejs

Strange behavior of Demo Classifier #104