Open mobguang opened 4 years ago
Hi @nttstar , Could you please kindly help with it?
Thanks a lot!
Sorry I haven't used triplet loss for a long time.
Hi @nttstar , Thanks for your reply. Could you please provide the advice for how to fine tuning the models? Which train.py file and what parameters are necessary for fine tuning models?
Thanks!
I just trained my customized dataset with following parameters (by recognition/train.py): CUDA_VISIBLE_DEVICES='0' python train.py --dataset emore --loss cosface --network r50 --lr 0.1 --lr-steps 80000,120000,1000000 --pretrained /home/dev/facial_workspace/training/pretrained-models/model-r50-am-lfw/model --models-root /home/dev/facial_workspace/training/all_vip_countries/models --per-batch-size 128
And then I tried to fine tuning my model with following parameters (by recognition/train.py): CUDA_VISIBLE_DEVICES='0' python train.py --dataset emore --loss triplet --network mnas05 --lr 0.01 --lr-steps 80000,130000,1000000 --pretrained /home/dev/facial_workspace/training/all_vip_countries/models/r50-models/model --models-root /home/dev/facial_workspace/training/all_vip_countries/models
The fine tuning calling argument details: Called with argument: Namespace(batch_size=60, ckpt=3, ctx_num=1, dataset='emore', dataset_path='/home/dev/facial_workspace/training/all_vip_countries/output', frequent=20, image_channel=3, image_shape=[112, 112, 3], kvstore='device', loss='triplet', lr=0.01, lr_steps='80000,130000,1000000', models_root='/home/dev/facial_workspace/training/all_vip_countries/models', mom=0.9, network='mnas05', num_classes=172460, per_batch_size=60, pretrained='/home/dev/facial_workspace/training/all_vip_countries/models/r50-models/model', pretrained_epoch=0, rescale_threshold=0, val_targets=['lfw', 'cfp_ff', 'agedb_30'], verbose=2000, wd=0.0005
But I encounter this error:
File "/home/dev/anaconda3/envs/intelligent_platform/lib/python3.6/threading.py", line 916, in _bootstrap_inner self.run() File "/home/dev/anaconda3/envs/intelligent_platform/lib/python3.6/threading.py", line 864, in run self._target(*self._args, **self._kwargs) File "/home/dev/anaconda3/envs/intelligent_platform/lib/python3.6/site-packages/mxnet/io/io.py", line 399, in prefetch_func self.next_batch[i] = self.iters[i].next() File "/data/junjiexun/insightface/recognition/triplet_image_iter.py", line 481, in next self.reset() File "/data/junjiexun/insightface/recognition/triplet_image_iter.py", line 390, in reset self.select_triplets() File "/data/junjiexun/insightface/recognition/triplet_image_iter.py", line 253, in select_triplets self.mx_model.forward(db, is_train=False) File "/home/dev/anaconda3/envs/intelligent_platform/lib/python3.6/site-packages/mxnet/module/module.py", line 593, in forward assert self.binded and self.params_initialized AssertionError
Traceback (most recent call last): File "train.py", line 436, in
main()
File "train.py", line 433, in main
train_net(args)
File "train.py", line 407, in train_net
epoch_end_callback = epoch_cb )
File "/home/dev/anaconda3/envs/intelligent_platform/lib/python3.6/site-packages/mxnet/module/base_module.py", line 502, in fit
allow_missing=allow_missing, force_init=force_init)
File "/home/dev/anaconda3/envs/intelligent_platform/lib/python3.6/site-packages/mxnet/module/module.py", line 309, in init_params
_impl(desc, arr, arg_params)
File "/home/dev/anaconda3/envs/intelligent_platform/lib/python3.6/site-packages/mxnet/module/module.py", line 297, in _impl
cache_arr.copyto(arr)
File "/home/dev/anaconda3/envs/intelligent_platform/lib/python3.6/site-packages/mxnet/ndarray/ndarray.py", line 2629, in copyto
return _internal._copyto(self, out=other)
File "", line 27, in _copyto
File "/home/dev/anaconda3/envs/intelligent_platform/lib/python3.6/site-packages/mxnet/_ctypes/ndarray.py", line 107, in _imperative_invoke
ctypes.byref(out_stypes)))
File "/home/dev/anaconda3/envs/intelligent_platform/lib/python3.6/site-packages/mxnet/base.py", line 255, in check_call
raise MXNetError(py_str(_LIB.MXGetLastError()))
mxnet.base.MXNetError: [20:05:00] src/operator/contrib/./../elemwise_op_common.h:135: Check failed: assign(&dattr, vec.at(i)): Incompatible attr in node at 0-th output: expected [512], got [256]
Stack trace:
[bt] (0) /home/dev/anaconda3/envs/intelligent_platform/lib/python3.6/site-packages/mxnet/libmxnet.so(+0x6b8b5b) [0x7f14b3157b5b]
[bt] (1) /home/dev/anaconda3/envs/intelligent_platform/lib/python3.6/site-packages/mxnet/libmxnet.so(+0x878f39) [0x7f14b3317f39]
[bt] (2) /home/dev/anaconda3/envs/intelligent_platform/lib/python3.6/site-packages/mxnet/libmxnet.so(+0x8797f2) [0x7f14b33187f2]
[bt] (3) /home/dev/anaconda3/envs/intelligent_platform/lib/python3.6/site-packages/mxnet/libmxnet.so(+0x9cacfc) [0x7f14b3469cfc]
[bt] (4) /home/dev/anaconda3/envs/intelligent_platform/lib/python3.6/site-packages/mxnet/libmxnet.so(mxnet::imperative::SetShapeType(mxnet::Context const&, nnvm::NodeAttrs const&, std::vector<mxnet::NDArray, std::allocator<mxnet::NDArray> > const&, std::vector<mxnet::NDArray, std::allocator<mxnet::NDArray> > const&, mxnet::DispatchMode)+0x1d27) [0x7f14b6334437]
[bt] (5) /home/dev/anaconda3/envs/intelligent_platform/lib/python3.6/site-packages/mxnet/libmxnet.so(mxnet::Imperative::Invoke(mxnet::Context const&, nnvm::NodeAttrs const&, std::vector<mxnet::NDArray, std::allocator<mxnet::NDArray> > const&, std::vector<mxnet::NDArray, std::allocator<mxnet::NDArray*> > const&)+0x1db) [0x7f14b633c9bb]
[bt] (6) /home/dev/anaconda3/envs/intelligent_platform/lib/python3.6/site-packages/mxnet/libmxnet.so(+0x3759cef) [0x7f14b61f8cef]
[bt] (7) /home/dev/anaconda3/envs/intelligent_platform/lib/python3.6/site-packages/mxnet/libmxnet.so(MXImperativeInvokeEx+0x62) [0x7f14b61f92b2]
[bt] (8) /home/dev/anaconda3/envs/intelligent_platform/lib/python3.6/lib-dynload/../../libffi.so.6(ffi_call_unix64+0x4c) [0x7f14fca81ec0]
Can any expert could help on this issue?
Thanks in advance!