Open yashsandansing opened 2 years ago
Thanks a lot! It's very helpful for me!
Hi, I was wondering if anyone tried and succeeded in training/tuning the model? The code runs well, but it seems like the model didn't change at all. I wrote an extra bit of code that allows the model to eval a few test images after each epochs, one would assume the result of the evals will get better over epochs (but there was 0 changes, all the pixels are in exact same values).
Was wondering if there's a mistake in my code, or is it the same for everyone?
@SamStark-AtWork I was the one who wrote this training code and even I couldn't get it to work later. The loss remains constant more or less throughout the training process. I read in one of the issues that if you set the model to model.eval() in the trainer function before training, you might get better results. But this script has gotten me some terrible results on multiple datasets. People have gotten it to work but there have been 0 training scripts on here. I think I saw a training script pending approval in the PR section once. You can maybe try that code?
I see, thanks for sharing, so it's not just me haha. I found the training script u mentioned, will try it soon, the author mentioned he's busy to check it, so hopefully it will work!
Some updates, so I've taken a look at the training script mentioned by @yashsandansing, to be honest it's not much difference, the biggest different is how the original trimap was produced (which did work a bit better), and most of the importing functions and utilities, there are some slight adjustments here and there for the training code on different commits that person did, but overall they don't produce much differences.
For the problem I described above where there are 0 changes by epoch, I realized the LR scheduler was stepping thru each batch (should be each epoch), which was my mistake, the problem is gone after that was dealt with.
Below section is more abt the quality and anyone who would like to try train/tune the model.
For context, I'm trying to produce a model that can produce matte of an object in given picture (not human in portrait). My dataset is self-produced, and very questionable quality (it's most likely one of the reason why I couldn't get great result), and they also only contains very similar objects differ in angles, which might explains the overfitting.
I tried both training the model from scratch and fine-tuning a pre-trained model. The result wasn't good, feels like there's a lot of overfitting (might work better if I try a better dataset). However, it does works better than a generic VGG16 autoencoder, and at a much lower VRAM cost too (which is incredible). Fine-tuning the pre-trained model does work better than expected, both method do seem to try and approach to a closer results than when they started.
I also tried some metaparams, the current optimizer and its params are pretty good already. The biggest changes I noticed is on the batch size, I tried 4 and 32, bigger batch size does produce better result (less overfitting).
The result accuracy jumped significantly between no-train and after first epoch, any changes after that is small to almost none, and my dataset contains a lot of similar pictures, which is why I deduced that overfitting is happening.
So far I haven't seen anyone that did train/fine-tune a trained model that gives great result yet, so do be aware of that if you are interested in doing it your own.
@SamStark-AtWork nice findings, do you mind provide your training code? I want to check if dataset matters at all. will post the result here
best regards
Hey @mtrabelsi , here u go, hope u get some good result! It's been a while since I last done anything on this, so it might be a bit messy, let me know if anything is not explained properly.
# imports
from src.models import modnet as MODNet
from src import trainer as MODTrainer
# import dataset and load as "dataloader" (check @yashsandansing's comment or the scripts at PR)
# import model
modnet = MODNet.MODNet(backbone_pretrained=False)
modnet = nn.DataParallel(modnet).cuda()
# for evaluate progress
evalPath = "INSERT YOUR PATH HERE"
if not os.path.isdir(evalPath):
os.makedirs(evalPath)
# pick 2 or more images here and store it for infer/eval later
# metaparams
optimizer = torch.optim.SGD(modnet.parameters(), lr=lr, momentum=0.9) # can try momentum=0.45 too
lr_scheduler = torch.optim.lr_scheduler.StepLR(optimizer, step_size=int(0.25 * epochs), gamma=0.1)
# Training starts here
for epoch in range(0, epochs):
for idx, (image, trimap, gt_matte) in enumerate(dataloader):
semantic_loss, detail_loss, matte_loss = MODTrainer.supervised_training_iter(modnet, optimizer, image.cuda(), trimap.cuda(), gt_matte.cuda())
lr_scheduler.step()
# eval for progress check and save images (here's where u visualize changes over training time)
with torch.no_grad():
_,_,debugImages = modnet(testImages.cuda(),True)
for idx, img in enumerate(debugImages):
saveName = "eval_%g_%g.jpg"%(idx,epoch+1)
torchvision.utils.save_image(img, os.path.join(evalPath,saveName))
print("Epoch done: " + str(epoch))
@yashsandansing Hi, I would like to ask, in the code you provided, the size of each image in the data processing section is not the same, how to do batch training?
hey @zzzcyyyw sorry for the extremely late reply. If I remember correctly, the code did give different sizes. I believe that the resize-code was data-specific and worked for my dataset. There were some other datasets for which it didnt work, so you'd need to tweak that section of the code
I had to review a lot of documentation and issues to implement the training code. So here is the code you'll be needing for training.
Initially, you'll need to download the pretrained model files from
https://drive.google.com/drive/folders/1umYmlCulvIFNaqPjwod1SayFmSRHziyR?usp=sharing
and move it toMODNet/pretrained
. In case you need to fine-tune the model to your own dataset, download - modnet_photographic_portrait_matting.ckpt. In case you need to use the backbone mobilenetv2 model, download that too.For preparing the dataset, I prepared a pandas dataframe which had 2 columns - ["image", "matte"] "image" had the absolute path to the images' location and "matte" had that respective image's matte image location.
After downloading, for preprocessing, the code is:
You might need to change the above code in methods, get_trimap and __get_item__ according to your dataset You would need to verify if your data is proper in the next to next step
Finally, create your dataset using the code below:
After your dataset has been created, 1st verify it by printing the first row of
data
and verifying if the shapes of image, matte, trimap are equal (only the channels can be different). IMPORTANT: Try printing the first of the trimaps. The only values in the numpy array should be 0, 0.5 and 1. Use the dataloader function to prepare your data for training:After this, the code for training is available in the
trainer.py
file:For using the backbone - Change
modnet = torch.nn.DataParallel(MODNet()).cuda()
tomodnet = torch.nn.DataParallel(MODNet(backbone_pretrained=True)).cuda()
in case you have the mobilenetv2 in your pretrained directory.For fine-tuning the existing MODNet model, use this snippet before the optimizer line: