Closed neixlo closed 4 years ago
I think I found out why this happens.
When I train for more epochs it's fine to train without a preloaded model. For me it seems that the model, if not trained well (in this case long) enough, the further processing in the refinement module fails. This could happen because the model at this early training stage just outputs nothing or random stuff which seems to result in an empty matrix, which then produces the error in the eigen lib.
Solution which worked for me: Train longer then 1 epoch.
50 epochs worked fine for me, I didn't tried less.
Hi @chensong1995 , I'd like to train from scratch and for testing purposes do something like this.
$ python src/train_core.py --batch_size 1 --n_epochs 2 --object_name cat --load_dir None
But this outputs:
-> print('Could not restore session properly, check the load_dir')
(Pdb)
If I add the parameter
--load_dir None
it sets load_dir to the string 'None' and is not None in terms of this line:if args.load_dir is not None:
However, if I modify it to something like:
if args.load_dir != 'None':
it seems to work. At least its training the ResNet networks.After the nets are trained there is another error in the
trainer.generate_data()
. I think it happens in this line, where the pr_para, pi_para = self.search_para(...) function gets called.It outputs following in the end:
If I run it with the --load_dir set to a saved model it runs through the training and the
trainer.generate_data()
.$ python src/train_core.py --batch_size 1 --n_epochs 501 --object_name cat --load_dir /home/nixi/Projects/HybridPose_custom/data/saved_weights/occlusion_linemod/cat/checkpoints/0.02/499
That will output:
saved
So that means for me that the regressor can access the eigen library and the $LD_LIBRARY path is setup correctly.Do I miss something? Any idea whats going on?
Thanks and keep up the good work!