Better training on TF 2.1

TechnikEmpire commented 4 years ago

Just FYI, if you preprocess images and shove them through the make_image_classifier.py module of TF hub, you can train much faster and get to much higher accuracy, quicker.

Process is:

Merge all images into the five folders. Do not separate into train, test and val. The script will do this for you later.
Resize all images down to 224x before hand.
Set batch size to 1024-ish (depends on memory/video card memory)
Pass --do_fine_tuning to the script as well.
Before executing, edit the model compilation code in the TF-Hub module to separate the final softmax layer from the previous dense layer. The code (as it is in Google's repo) combines both with layer.Dense(... activation:softmax...). You need to delete the activation from that dense layer and append a separate softmax layer immediately after it as the final layer and explicitly set its data type to float32. The reason you're doing this is because you're going to also edit the script to configure TF to use mixed precision to drastically boost computation speed, so long as you have a fairly recent nvidia card. Follow these instructions. The config must be changed before initializing any layers.
Lastly, edit the main() definition of the script to call set_memory_growth(YOUR_FIRST_GPU) to true, as demonstrated here. CUDNN will die if you don't.

In order to do that last step, you'll need to not install TF hub from pip, but rather git clone tf hub's code. Once cloned, you'll need to modify the imports of the make_image_classifier script to pull the local files in the same dir as it, then execute those modified scripts.

I'm only on epoch 3 of 10 (not 100 anymore) and I'm hitting 92.57% validation accuracy. I'm doing this on a pretty junky AMD machine that I popped an RTX 2060 into. Before doing the preprocessing, I was facing growing old waiting for this to get somewhere. I thought I'd share this because I've seen many other comments about people taking days to run training. It's only going to take ~2 hours to run through to epoch 10.

For greater clarity what's happening here is that you're not just transfer learning, you're telling TF to specialize the lower layers of the model to the domain, which increases accuracy by "a few points" according to Google.

Hope this helps someone.

GantMan commented 4 years ago

Very cool! Did you make one?

TechnikEmpire commented 4 years ago

@GantMan It's still going, on epoch 5/10 now and reporting ~96% train acc and ~92% validation still. I pulled your data set off mega for this FYI so thought I'd give back by sharing the above. :) Thanks for posting that.

GantMan commented 4 years ago

Awesome! I'd love to host the resulting file so others can use it, if that's good for you! Link to your socials etc. of course!

TechnikEmpire commented 4 years ago

Alright let me see how this goes. It looks like validation accuracy isn't really moving anymore. Once I'm done everything (I have to attempt a second type of model still) I'll come back with the results. I'm skeptical until I test it outside of TF.

TechnikEmpire commented 4 years ago

lol after all that I can't get the damn model out of TF 2. I can't freeze it, only use it as a SaveModel which is useless. There is a way to freeze in 2.x but it looks like there's a regression where training parameters from hub get baked in as constants. Tomorrow's another day I guess to look at it.

TechnikEmpire commented 4 years ago

Gonna close this out because TF2 is a trainwreck. No way to properly freeze hub trained models and no tools to get them out of the SavedModel format support TF2.x properly. Sorry for the let down. I'm going to try to reproduce the same thing (xfer learning plus fine turning) in 1.15. If I'm successful I'll open a PR.

TechnikEmpire commented 4 years ago

@GantMan I've successfully re-written the repo to use TF 2.1 and TF-Hub so we get all models for TF 2 that are on tfhub.dev. Training is simple, you just give the URL of the model type you want and let it go.

Anyway I want to modify my fork to keep the trained models with LFS in the repo but it looks like I can't enable that on forks. What do you think? I guess the question is, what's more of a pain? To keep uploading models on the release page manually, or getting LFS enabled?

GantMan commented 4 years ago

IMO, I'd love for your code to be sent as a PR back to the repo. You could then host the models for a short amount of time in a dropbox link and I'll upload them to our S3 and host them for everyone for free.

I plan on learning fast.ai and seeing if I can beat TF 2.1 and TF-Hub - all the models and code are useful to the community!

TechnikEmpire commented 4 years ago

Alright I broke my fork with LFS stuff so I'll redo it and send a PR excluding models. I'll then send models separately. I'm training inception v3, mobilenet v2 large, resnet 50 v2 and NasnetA. Exporting them to tflite, frozen pb, saved model, keras model and tfjs.

TechnikEmpire commented 4 years ago

AFAIK you can post the models as releases. Would you do that instead?

GantMan commented 4 years ago

I'm unfamiliar with exactly what posting models as releases means. Do you mean just include datestamp in the URI so people know when it was pushed?

TechnikEmpire commented 4 years ago

You just create a release like this:

And then you drop the zipped models here:

Then you let Github host your files :)

Anyway I really wanted to run training through on all models I configured but something has come up that has put me out of time. Sorry about that. I'll submit the PR as-is.

Also I don't think we're ever going to break out of the ~93% zone without modifying the source data set. There's too much ambiguity between "sexy" and "porn" so they're constantly confused. If you merge those categories you'll see that you can break that ceiling.

GantMan commented 4 years ago

Sounds good!

On Sun, Mar 1, 2020 at 10:32 AM Jesse Nicholson notifications@github.com wrote:

You just create a release like this:

[image: image] https://user-images.githubusercontent.com/11234763/75629396-caeaa080-5baf-11ea-81b1-6d74362ce4a0.png

And then you drop the zipped models here:

[image: image] https://user-images.githubusercontent.com/11234763/75629407-dfc73400-5baf-11ea-9a7b-3c8303c492ff.png

Then you let Github host your files :)

Anyway I really wanted to run training through on all models I configured but something has come up that has put me out of time. Sorry about that. I'll submit the PR as-is.

Also I don't think we're ever going to break out of the ~93% zone without modifying the source data set. There's too much ambiguity between "sexy" and "porn" so they're constantly confused. If you merge those categories you'll see that you can break that ceiling.

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/GantMan/nsfw_model/issues/49?email_source=notifications&email_token=AAHTOJOQJ6G4DGSJQUMCDYLRFKEY7A5CNFSM4K5BQQ62YY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOENNDZRQ#issuecomment-593116358, or unsubscribe https://github.com/notifications/unsubscribe-auth/AAHTOJIPVGCKYNWCT6KGTQ3RFKEY7ANCNFSM4K5BQQ6Q .

TechnikEmpire commented 4 years ago

@GantMan Alright PR is open. Let me know.

GantMan commented 4 years ago

Fantastic work! I love the single file with everything.

I would love to throw in quantized models in that zip and call it a day if that works for you!

TechnikEmpire commented 4 years ago

I'll have to check the command line tool that comes with tfjs when you install with pip. It may have a quant option for fp16 option. I'm out for a while tho today.

GantMan commented 4 years ago

I believe it's a flag in the tfjs command. --quantization_bytes 1 I believe!

TechnikEmpire commented 4 years ago

Also, I have ~360K images with object bounding boxes defined for everything you'd want to detect in terms of nsfw (not hentai tho). If you know or figure out how to use the tf2 and tf hub api to fine tune an SSD or SSDLite model, please let me know.

TechnikEmpire commented 4 years ago

@GantMan I created a new PR and linked to the updated model package that now has a quant model inside as well

GantMan / nsfw_model

Better training on TF 2.1 #49