YuwenXiong / py-R-FCN

R-FCN with joint training and python support
MIT License
1.05k stars 471 forks source link

why the training was interrupted unexpectedly after some iteration(60000)? #102

Open shadowuyl opened 6 years ago

shadowuyl commented 6 years ago

I trained my data with ResNet-50, but it was interrupted unexpectedly(when i train my data with small iteration, such as 500, it was ok. ). I don't know what is the reason. Anyone could give me some advice. Thank you very much. This is the output:

27478 speed: 0.765s / iter 27479 I0124 10:34:33.386742 66910 solver.cpp:228] Iteration 63600, loss = 0.0277704 27480 I0124 10:34:33.386772 66910 solver.cpp:244] Train net output #0: accuarcy = 1 27481 I0124 10:34:33.386780 66910 solver.cpp:244] Train net output #1: loss_bbox = 0.0094369 ( 1 = 0.0094369 loss) 27482 I0124 10:34:33.386785 66910 solver.cpp:244] Train net output #2: loss_cls = 0.00284368 ( 1 = 0.00284368 loss) 27483 I0124 10:34:33.386790 66910 solver.cpp:244] Train net output #3: rpn_cls_loss = 0.000252465 ( 1 = 0.000252465 loss) 27484 I0124 10:34:33.386792 66910 solver.cpp:244] Train net output #4: rpn_loss_bbox = 0.000642761 ( 1 = 0.000642761 loss) 27485 I0124 10:34:33.386797 66910 sgd_solver.cpp:106] Iteration 63600, lr = 0.001 27486 ./experiments/scripts/rfcn_end2end_ohem.sh: line 58: 66910 Killed ./tools/train_net.py --gpu ${GPU_ID} --solver models/${PT_DIR}/${NET}/rfcn_end2end/solver_ohem.prototxt --weights data/imagenet_models/${NET}-model. caffemodel --imdb ${TRAIN_IMDB} --iters ${ITERS} --cfg experiments/cfgs/rfcn_end2end_ohem.yml ${EXTRA_ARGS}

foralliance commented 6 years ago

@shadowuyl It should be the lack of memory