about train - Githubissues

Linfeng-Tang / SeAFusion

The code of " Image fusion in the loop of high-level vision tasks: A semantic-aware real-time infrared and visible image fusion network"

MIT License

207 stars 37 forks source link

about train #7

Open LiYan0306 opened 1 year ago

LiYan0306 commented 1 year ago

Why I can't train the SeAFusion network? I have tried to change the batch-size from 32 to 2, but still not work. The test.py can be used normally, but train.py can't. Always have this error: RuntimeError: CUDA out of memory. Tried to allocate 900.00 MiB (GPU 0; 11.00 GiB total capacity; 10.05 GiB already allocated; 0 bytes free; 10.06 GiB reserved in total by PyTorch) 我在测试网络的时候没有问题，但是在训练网络的时候总是报这个错，我尝试了清理缓存、检查环境配置、减小batch-size但是都没有用，并且一直报这一个相同的错误。是什么原因呢？

Linfeng-Tang commented 1 year ago

你好，你可以加一下我QQ：2458707789（备注学校+姓名），我可以使用向日葵协助你尝试解决这个问题哈

baisong666 commented 1 year ago

您好，这个问题解决了吗？我也遇到了相同的问题

Linfeng-Tang commented 1 year ago

我没有遇到你这个问题但是我希望你能把你的运行环境发给我希望我可以协助里解决你在运行过程中有关注显存的占用吗？

XinYuanZhan commented 1 year ago

您好，这个问题已经解决了。我将train程序中的pin_memory改为False就可以了，您在main程序中的train_fusion函数并没有将Batchsize更改为args.batchsize，所以在下面更改批量一直没用。。。

Linfeng-Tang commented 1 year ago

嗯嗯谢谢提醒

limyoonahh commented 1 year ago

您好，这个问题已经解决了。我将train程序中的pin_memory改为False就可以了，您在main程序中的train_fusion函数并没有将Batchsize更改为args.batchsize，所以在下面更改批量一直没用。。。

你好，请问只将train程序中三个地方的pin_memory改为False就可以运行了吗，我还是提示下述错误：CUDA out of memory. Tried to allocate 300.00 MiB. GPU 0 has a total capacty of 14.75 GiB of which 160.81 MiB is free. Process 135540 has 14.59 GiB memory in use. Of the allocated memory 14.44 GiB is allocated by PyTorch, and 24.97 MiB is reserved by PyTorch but unallocated.

Sun-drenched commented 1 year ago

您好，这个问题已经解决了。我将train程序中的pin_memory改为False就可以了，您在main程序中的train_fusion函数并没有将Batchsize更改为args.batchsize，所以在下面更改批量一直没用。。。

你好，请问只将train程序中三个地方的pin_memory改为False就可以运行了吗，我还是提示下述错误：CUDA out of memory. Tried to allocate 300.00 MiB. GPU 0 has a total capacty of 14.75 GiB of which 160.81 MiB is free. Process 135540 has 14.59 GiB memory in use. Of the allocated memory 14.44 GiB is allocated by PyTorch, and 24.97 MiB is reserved by PyTorch but unallocated.

在main中调整batch_size，一般调节到4就可以run

FLY136 commented 10 months ago

我也遇到了相同的问题，我的解决办法是修改batch_size改成4、在pin_memory改为False、以及关闭一些占用GPU大的进程。最后跑起来了。