First I would like to thank you for open sourcing the code. I have been trying to re-train the model(Torch code) on COCO and MPII dataset. My training works well for 15 iterations and later, I face a file not found error. From what I understand, the code is looking for a file with an encoding that my system is not aware of. Below are my traceback attached,
Training error
root@8bbb2c22127d:~/AlphaPose/train/src# th main.lua -expID coco1 -nGPU 2 -trainBatch 4 -validBatch 16 -nEpochs 40
Saving everything to: /root/AlphaPose/train/exp/coco/coco1
Input is a vector of length: 3
Output is a table
Entry 1 is a tensor with dimensions: 33 x 80 x 64
Entry 2 is a tensor with dimensions: 33 x 80 x 64
Entry 3 is a tensor with dimensions: 33 x 80 x 64
Entry 4 is a tensor with dimensions: 33 x 80 x 64
Entry 5 is a tensor with dimensions: 33 x 80 x 64
Entry 6 is a tensor with dimensions: 33 x 80 x 64
Entry 7 is a tensor with dimensions: 33 x 80 x 64
Entry 8 is a tensor with dimensions: 33 x 80 x 64
==> Creating model from file: models/hg-prm.lua
==> Converting module to nn.DataParallelTable
warning: could not load nccl, falling back to default communication
==> Converting model to CUDA
==> Starting epoch: 1/40
/root/torch/install/bin/luajit: /root/torch/install/share/lua/5.1/threads/threads.lua:183: [thread 3 callback] /root/torch/install/share/lua/5.1/image/init.lua:367: /root/AlphaPose/train/data/coco/images/COCO_train2014_000000044788.jpgueue:addjA: No such file or directory
stack traceback:
[C]: in function 'error'
/root/torch/install/share/lua/5.1/image/init.lua:367: in function 'loadImage'
/root/AlphaPose/train/src/util/pose.lua:57: in function 'generateSample'
/root/AlphaPose/train/src/util/pose.lua:286: in function 'loadData'
/root/AlphaPose/train/src/util/dataloader.lua:77: in function </root/AlphaPose/train/src/util/dataloader.lua:76>
[C]: in function 'xpcall'
/root/torch/install/share/lua/5.1/threads/threads.lua:234: in function 'callback'
/root/torch/install/share/lua/5.1/threads/queue.lua:65: in function </root/torch/install/share/lua/5.1/threads/queue.lua:41>
[C]: in function 'pcall'
/root/torch/install/share/lua/5.1/threads/queue.lua:40: in function 'dojob'
[string " local Queue = require 'threads.queue'..."]:15: in main chunk
stack traceback:
[C]: in function 'error'
/root/torch/install/share/lua/5.1/threads/threads.lua:183: in function 'dojob'
/root/AlphaPose/train/src/util/dataloader.lua:90: in function '(for generator)'
/root/AlphaPose/train/src/train.lua:38: in function 'step'
/root/AlphaPose/train/src/train.lua:164: in function 'train'
main.lua:19: in main chunk
[C]: in function 'dofile'
/root/torch/install/lib/luarocks/rocks/trepl/scm-1/bin/th:150: in main chunk
[C]: at 0x00405d50
As you could see, the code is looking for COCO_train2014_000000044788.jpgueue:addjA which obviously is not a valid name for a file. My knowledge in Lua is limited and I am unable to debug this error. Any help would be much appreciated.
Hello,
First I would like to thank you for open sourcing the code. I have been trying to re-train the model(Torch code) on COCO and MPII dataset. My training works well for 15 iterations and later, I face a file not found error. From what I understand, the code is looking for a file with an encoding that my system is not aware of. Below are my traceback attached,
Training error
As you could see, the code is looking for
COCO_train2014_000000044788.jpgueue:addjA
which obviously is not a valid name for a file. My knowledge in Lua is limited and I am unable to debug this error. Any help would be much appreciated.Thanks and Cheers,