error with train.py - Githubissues

VDIGPKU / M2Det

M2Det: A Single-Shot Object Detector based on Multi-Level Feature Pyramid Network

MIT License

1.45k stars 318 forks source link

error with train.py #39

Open chituma110 opened 5 years ago

chituma110 commented 5 years ago

command: CUDA_VISIBLE_DEVICES=0,1,2,3,4,5,6,7 python train.py -c=configs/m2det512_vgg.py --ngpu 8 -t True

raceback (most recent call last): File "train.py", line 88, in loss_l, loss_c = criterion(out, priors, targets) File "/home/xxx/anaconda2/envs/M2Det/lib/python3.6/site-packages/torch/nn/modules/module.py", line 489, in call result = self.forward(*input, **kwargs) File "/data2/xxx/Object_Detection/M2Det/layers/modules/multibox_loss.py", line 106, in forward conf_p = conf_data[(pos_idx+neg_idx).gt(0)].view(-1,self.num_classes) RuntimeError: CUDA out of memory. Tried to allocate 3.80 GiB (GPU 0; 11.92 GiB total capacity; 8.33 GiB already allocated; 2.69 GiB free; 502.63 MiB cached)

dshahrokhian commented 5 years ago

Try reducing the batch size in the config file, it solved it for me.

chituma110 commented 5 years ago

I reduced batch size from 16 to 8,but got the same error .

MenGuangwen-CN-0411 commented 5 years ago

@chituma110 Maybe，so much num_workers would cause some other cost on different pc, set num_workers=0 and have a try. Tell me the result whether it work well.
Using the default set in 320x320-VGG cause OOM ,set batch size=2 and it's still OOM.Then set num_workers=0,it's well.I have one GT-1080 and using win10 pytorch1.0

MenGuangwen-CN-0411 commented 5 years ago

@dshahrokhian ，Sir，I want to konw whether you get the result described on coco2014 or VOC dataset in the paper ：m2det: A Single-Shot Object Detector based on Multi-Level Feature Pyramid

DayChan commented 5 years ago

@dshahrokhian ，Sir，I want to konw whether you get the result described on coco2014 or VOC dataset in the paper ：m2det: A Single-Shot Object Detector based on Multi-Level Feature Pyramid

Did you get the result of vgg16+m2det320 in the paper? I just can't reproduce it.

TekiLi commented 5 years ago

I reduced batch size from 16 to 8,but got the same error .

you may use the pytorch version is 0.3，change the pytorch version to 0.4 or 1.0

primary-studyer commented 5 years ago

I reduced batch size from 16 to 8,but got the same error .

batch再设置小一点就可以了就是会很慢。 epoch_size会很大