chenyuntc / simple-faster-rcnn-pytorch

A simplified implemention of Faster R-CNN that replicate performance from origin paper
Other
4k stars 1.14k forks source link

out of memory 训练的时候显存一直在增长 #252

Open songhat opened 1 year ago

songhat commented 1 year ago

已经尝试的方法: loss.item()没问题 dataloader加载数据也没有增长数据。

deepxzy commented 1 year ago

后面有人提到了,train.py第76行,这两个顺序不对的话好像是会造成显存泄露 change

    for ii, (img, bbox_, label_, scale) in tqdm(enumerate(dataloader)):

to

    for ii, (img, bbox_, label_, scale) in enumerate(tqdm(dataloader)):
songhat commented 1 year ago

@deepxzy hi!感谢你的回答,我尝试你的方案,但是不work!

fatejzz commented 1 year ago

我有类似的训练时内存不断增加的问题,调试之后发现是eval阶段内存占用会不断增大

hungphandinh92it commented 6 months ago

I train on nvidia pytorch docker and also have this problem. Try not to use the pin_memory resolve this problem. on train.py test_dataloader = data_.DataLoader(testset, batch_size=1, num_workers=opt.test_num_workers, shuffle=False, pin_memory=False )