RetroCirce / HTS-Audio-Transformer

The official code repo of "HTS-AT: A Hierarchical Token-Semantic Audio Transformer for Sound Classification and Detection"
https://arxiv.org/abs/2202.00874
MIT License
341 stars 62 forks source link

Training will get stuck and stop without reporting an error #37

Open YooWang opened 1 year ago

YooWang commented 1 year ago

I set deterministic to be False, and it can run successfully. But when it runs to about 68% of epoch=1, the training will get stuck and stop without reporting an error, and it will not move. How can I solve this?

RetroCirce commented 1 year ago

Did you try to use a single GPU for training and testing first? Setting deterministic does not cause the stuck. I once met this problem before but soon I updated pytorch lightening and it got fixed. A possible problem may lie in the mutli-Gpu training stage when GPUs stuck with each other for waiting the sync.

YooWang commented 1 year ago

Thank you for your reply. I will try to test with a single gpu. By the way, what version of cuda, pytorch, pytorch-lighting did you finally use?

RetroCirce commented 1 year ago

I use pytorch_lightning==1.5.9, Cuda=10.1. But I think now the new version of pytorch lightning is also working, just need a few tweaking.