Open HamzahNizami opened 5 years ago
After recloning the repository and following the steps again I attempted to train the model on 50 of the images from the training dataset. When using the training command separately (i.e. not the train.sh) I get the following:
cudaSuccess (29 vs. 0) driver shutting down Check failure stack trace: Aborted (core dumped)
However, when using the following command: ./train.sh --50
I get the following error:
double free or corruption (!prev) ./train.sh: line 26: 9337 Aborted (core dumped) python testAttentionMask.py 0 attentionMask-8-128 --init_weights attentionMask-8-128iter$STEP.caffemodel --dataset train2014 validation done start evaluation
I then tried running the following test command by itself: python testAttentionMask.py 0 attentionMask-8-128 --init_weights attentionmask-8-128final.caffemodel --dataset val2014 --end 50
and got the following error:
loading annotations into memory... Done (t=3.57s) creating index... index created! double free or corruption (!prev) Aborted (core dumped)
I currently have no idea what is happening but when looking online it is suggested that its to do with memory allocation, which would have to do with the code itself?
I hope you can get back to me regarding this situation, thanks in advance.
Hi @HamzahNizami , I was out of office for a few weeks. I tried to reproduce your error, but was not successful.
Can you try to localize the error within the testAttentionMask.py script? Is it a problem of pycocotools or some caffe layer?
I had finally got the model training but I have now begun getting the following two errors and I'm unsure as to why.
Snapshotting to binary proto file params/attentionMask-8-128_iter_50.caffemodel I0304 23:27:38.113571 4767 sgd_solver.cpp:273] Snapshotting solver state to binary proto file params/attentionMask-8-128_iter_50.solverstate F0304 23:27:40.473143 4767 syncedmem.hpp:31] Check failed: error == cudaSuccess (29 vs. 0) driver shutting down Check failure stack trace: ./train.sh: line 26: 4767 Aborted (core dumped) python trainAttentionMask.py 0 attentionMask-8-128 --init_weights resnet-50-model.caffemodel --step $SIZE_EPOCH
double free or corruption (!prev) ./train.sh: line 26: 5032 Aborted (core dumped) python testAttentionMask.py 0 attentionMask-8-128 --init_weights attentionMask-8-128iter$STEP.caffemodel --dataset train2014