ZhengPeng7 / BiRefNet

[CAAI AIR'24] Bilateral Reference for High-Resolution Dichotomous Image Segmentation
https://www.birefnet.top
MIT License
322 stars 28 forks source link

Finetuning with small dataset #24

Closed LeDuySon closed 1 month ago

LeDuySon commented 2 months ago

First of all, thank you for your great project. I want to ask you for some recommendation about finetuning with a small dataset (Around 400 images), my problem is main car segmentation (Only segment one car even if the image contains multiple cars, the main car is the biggest one and in the middle of the image)

ZhengPeng7 commented 2 months ago
  1. Model: Since you have only a little data, I suggest using a smaller model, e.g., choose the swin_v1_tiny as the backbone in config.py (remember to put the weights of backbone in the right place).
  2. Freezing layers: I'm not sure whether freezing some layers can help the training. But if you want, you can turn on the freeze_bb option in config.py to easily freeze the layers of the backbone.
  3. Loss: You can turn off the ssim loss in config.py since it benefits segmentation in fine regions, which is unnecessary in your case. In my experience, IoU loss converges much faster but decreases in accuracy. But if you want to see the results faster, you can leave it on only.
  4. If you do not have extra data, you can split 40 images for validation.

If you have further questions, feel free to leave messages :)

LeDuySon commented 2 months ago

Thank you so much for your detailed answer.

About the swin_v1_tiny, can we have a massive dataset training with this one? I found the massive training one is much better in general case compare to the one that trained on only one dataset.

About the loss function, i think about it again because my input image can sometime be like this, kinda complex so i will need to try it myself, do you have any recommendation about where to hire gpu?

image

ZhengPeng7 commented 2 months ago

Thanks for your feedback! However, it takes a lot of time to do the massive training, even with swin-tiny, which I haven't started. I can only say I might spare my own and GPUs' time for it in the future.

If your cases are similar to the image above, I recommend using the default settings of losses in my project.

About renting GPUs, I personally recommend those on autodl, which is the cheapest platform I used. But if you are not in China (you know there are firewalls to block people from things like Google), I recommend finding some GPUs on vast.ai. BTW, if you want to use the default training setting (bs=2, bb=Swin-L), you need GPUs with more than 37G memory. If you want to train with swin-tiny, you can use GPUs with 24G memory. Batch size is better to be larger than 1 (I've tested full training with bs=1).

LeDuySon commented 2 months ago

Thank you! Have you tried to export this model to Onnx? I plan to deploy this model to triton inference server latter after the training so if you have not, maybe i will try to do it and get back to you

ZhengPeng7 commented 2 months ago

Sorry, I haven't done this kind of thing. But if you encounter some problems while doing the deployment, which you think I may know about, feel free to leave messages here. Good luck!

LeDuySon commented 2 months ago

Thank you!

LeDuySon commented 2 months ago

Hi @ZhengPeng7 , i know this one is not related to this discussion but i can't load the BiRefNet_DIS_ep500-swin_v1_tiny anymore? Do you know why? I have changed the backbone in config to swin_v1_t but when loading the checkpoint, it just shows mismatched between many layers

ZhengPeng7 commented 2 months ago

There were some differences between the previous codes and the descriptions in the paper in terms of model architecture. I made the modifications so that they are 100% correct with each other now. I'll try to train a swin-tiny version in the massive training setting. I'll reply to you once it's done.

LeDuySon commented 2 months ago

Thanks man!

ZhengPeng7 commented 1 month ago

Feel free to reopen it if you have any more question.