Etienne-Meunier-Inria / EM-Flow-Segmentation

Implementation for paper : EM-driven unsupervised learning for efficient motion segmentation
GNU Affero General Public License v3.0
13 stars 1 forks source link

An error occurred while saving the training model #3

Open fanxuxiang opened 1 year ago

fanxuxiang commented 1 year ago

Hi! Thanks for sharing your excellent work. I am very interested in it. @Etienne-Meunier

But when executing the training command(_python3 model_train.py --path_save_model train_me --base_dir /home/fxx/data/DAVIS-data --data_file DataSplit_me/DAVISD16Split ), some errors occurred. It seems that there was a hyperparameter storage error when saving the model. I have tried many methods like https://github.com/pytorch/pytorch/issues/78720 and https://github.com/Lightning-AI/lightning/issues/9318 , but cannot solve it. The default DAVIS dataset is used, and the body of the code has not been changed. Does anyone encounter this problem or know how to solve it?

Etienne-Meunier-Inria commented 1 year ago

Hi ! Thank you for your message, it seems like it's an error related to pytorch lightning. Can you try running the code using the specs given in the "Environment" part of the readme ?

pytorch_lightning==1.2.8
pandas==0.24.1
flowiz
wandb==0.10.26
ipdb==0.13.5
torch==1.8.1
torchvision==0.9.1
seaborn
fanxuxiang commented 1 year ago

Yes, thank you. This is an error while saving the training logs. Skipping the saving of some parameters can temporarily avoid this error. Also, can the algorithm be accelerated using GPU during inference? It takes seconds for me to test using one optical flow of 1960 * 1020.

Etienne-Meunier-Inria commented 1 year ago

Happy you manage to deal with the error. At inference the algorithm for segmentation is just the forward pass of the backbone ( in our case a classical U-Net ), you don't need to compute the loss / motion models. Thus, you can use GPU acceleration as you usually do with Pytorch models. If you want to further accelerate inference you can either reduce the input size or train a lighter backbone model.