baowenbo / DAIN

Depth-Aware Video Frame Interpolation (CVPR 2019)
https://sites.google.com/view/wenbobao/dain
MIT License
8.19k stars 840 forks source link

Segmentation fault (core dumped) when trying to train or overtrain a model #156

Open Adodiego opened 1 year ago

Adodiego commented 1 year ago

Hello, I'm trying to use the train.py function to train a RIFE model with my own database. It all goes well till the last epoch where it gives me a "Segmentation fault (core dumped)". I'm using --nproc_per_node=1 and --world_size=1, so maybe that's the issue? It doesn't matter how many epoch I use it always gives this error at the last epoch. Also by launching the code like this: sudo -E /usr/bin/python3 -m torch.distributed.launch --nproc_per_node=1 train.py --epoch=1 --world_size=1 the error becomes simply "Segmentation fault" without the "(core dumped)" part. Any ideas of why is giving me this issue?

rriicckkee commented 8 months ago

Have you solved this problem?