GantMan / nsfw_model

Keras model of NSFW detector
Other
1.73k stars 272 forks source link

How many images in the training set for each class? #10

Open misterDDF opened 5 years ago

misterDDF commented 5 years ago

Hi,thanks for sharing the code and model, it helps me a lot Can you tell me how many images in the training set for the 5 classes, I'm not familiar about keras, does the code below means that only 500*batch_size images are trained every epoch, and not every images in the training set of nsfw_data_scraper is trained in an epoch? image

btw,I test the model on the test set(2000 images for each class), the correct rate of class neutral is 0.1 lower than the result confusion matrix shows while other classes performs well

GantMan commented 5 years ago

Hi!

You are correct. Training the entire dataset would be most impressive as I currently have around 30,000+ images per class. Additionally, I've increased the batch size to 32, which means 16,000 images are pulled in each epoch. Since I'm batching and using Stochastic Gradient Descent, I've found this to be a powerful method for continuous refinement of the model without overfitting.

Additionally, I have perturbation on the images, so that noise, rotation, and cropping is added randomly. Making it mathematically infeasible that the same exact image would ever be used twice.

After some serious re-training/refining I'd love for you to re-test my latest model. I'm getting around 93% accuracy. This was trained longer on an even larger dataset.

Side note:

You say you're not familiar with Keras, if you use some other method, I'd love for you to contribute. I'm planning on writing a Tensorflow JS training version. It would be entertaining to see which ML framework performs best.

misterDDF commented 5 years ago

Thanks for your reply.

Yes I'm trying to reimplement this model with Pytorch, but the model accuracy for now can only reach about 83%, thinks I should retrain it more seriously.

GantMan commented 5 years ago

Here's a blog post I'm working on for how I trained the model: https://medium.com/@gantlaborde/howto-ai-nsfw-detection-229a9725829c

devinhee commented 5 years ago

Hi! I retrained this model with keras, but the model accuracy for now can only reach 89%. I guess it might be something wrong with my dataset, I can not get enough data for sexy class and drawings class, where did you get data of these two class.

GantMan commented 5 years ago

@devinhee - what's your data categorization error rate at? If you did a basic pull off of reddit etc. You might have some significant misclassifications that are holding your model back.

devinhee commented 5 years ago

@devinhee - what's your data categorization error rate at? If you did a basic pull off of reddit etc. You might have some significant misclassifications that are holding your model back.

Categorization error rate is 20% ~ 25%. Actually, I did some basic data cleaning, deleted bad images, removed duplicate images. But I did not check every single image of each categorization.