deepinsight / insightface

State-of-the-art 2D and 3D Face Analysis Project
https://insightface.ai
23.29k stars 5.41k forks source link

Fine tuning error #1174

Open mobguang opened 4 years ago

mobguang commented 4 years ago

I just trained my customized dataset with following parameters (by recognition/train.py): CUDA_VISIBLE_DEVICES='0' python train.py --dataset emore --loss cosface --network r50 --lr 0.1 --lr-steps 80000,120000,1000000 --pretrained /home/dev/facial_workspace/training/pretrained-models/model-r50-am-lfw/model --models-root /home/dev/facial_workspace/training/all_vip_countries/models --per-batch-size 128

And then I tried to fine tuning my model with following parameters (by recognition/train.py): CUDA_VISIBLE_DEVICES='0' python train.py --dataset emore --loss triplet --network mnas05 --lr 0.01 --lr-steps 80000,130000,1000000 --pretrained /home/dev/facial_workspace/training/all_vip_countries/models/r50-models/model --models-root /home/dev/facial_workspace/training/all_vip_countries/models

The fine tuning calling argument details: Called with argument: Namespace(batch_size=60, ckpt=3, ctx_num=1, dataset='emore', dataset_path='/home/dev/facial_workspace/training/all_vip_countries/output', frequent=20, image_channel=3, image_shape=[112, 112, 3], kvstore='device', loss='triplet', lr=0.01, lr_steps='80000,130000,1000000', models_root='/home/dev/facial_workspace/training/all_vip_countries/models', mom=0.9, network='mnas05', num_classes=172460, per_batch_size=60, pretrained='/home/dev/facial_workspace/training/all_vip_countries/models/r50-models/model', pretrained_epoch=0, rescale_threshold=0, val_targets=['lfw', 'cfp_ff', 'agedb_30'], verbose=2000, wd=0.0005

But I encounter this error:

File "/home/dev/anaconda3/envs/intelligent_platform/lib/python3.6/threading.py", line 916, in _bootstrap_inner self.run() File "/home/dev/anaconda3/envs/intelligent_platform/lib/python3.6/threading.py", line 864, in run self._target(*self._args, **self._kwargs) File "/home/dev/anaconda3/envs/intelligent_platform/lib/python3.6/site-packages/mxnet/io/io.py", line 399, in prefetch_func self.next_batch[i] = self.iters[i].next() File "/data/junjiexun/insightface/recognition/triplet_image_iter.py", line 481, in next self.reset() File "/data/junjiexun/insightface/recognition/triplet_image_iter.py", line 390, in reset self.select_triplets() File "/data/junjiexun/insightface/recognition/triplet_image_iter.py", line 253, in select_triplets self.mx_model.forward(db, is_train=False) File "/home/dev/anaconda3/envs/intelligent_platform/lib/python3.6/site-packages/mxnet/module/module.py", line 593, in forward assert self.binded and self.params_initialized AssertionError

Traceback (most recent call last): File "train.py", line 436, in main() File "train.py", line 433, in main train_net(args) File "train.py", line 407, in train_net epoch_end_callback = epoch_cb ) File "/home/dev/anaconda3/envs/intelligent_platform/lib/python3.6/site-packages/mxnet/module/base_module.py", line 502, in fit allow_missing=allow_missing, force_init=force_init) File "/home/dev/anaconda3/envs/intelligent_platform/lib/python3.6/site-packages/mxnet/module/module.py", line 309, in init_params _impl(desc, arr, arg_params) File "/home/dev/anaconda3/envs/intelligent_platform/lib/python3.6/site-packages/mxnet/module/module.py", line 297, in _impl cache_arr.copyto(arr) File "/home/dev/anaconda3/envs/intelligent_platform/lib/python3.6/site-packages/mxnet/ndarray/ndarray.py", line 2629, in copyto return _internal._copyto(self, out=other) File "", line 27, in _copyto File "/home/dev/anaconda3/envs/intelligent_platform/lib/python3.6/site-packages/mxnet/_ctypes/ndarray.py", line 107, in _imperative_invoke ctypes.byref(out_stypes))) File "/home/dev/anaconda3/envs/intelligent_platform/lib/python3.6/site-packages/mxnet/base.py", line 255, in check_call raise MXNetError(py_str(_LIB.MXGetLastError())) mxnet.base.MXNetError: [20:05:00] src/operator/contrib/./../elemwise_op_common.h:135: Check failed: assign(&dattr, vec.at(i)): Incompatible attr in node at 0-th output: expected [512], got [256] Stack trace: [bt] (0) /home/dev/anaconda3/envs/intelligent_platform/lib/python3.6/site-packages/mxnet/libmxnet.so(+0x6b8b5b) [0x7f14b3157b5b] [bt] (1) /home/dev/anaconda3/envs/intelligent_platform/lib/python3.6/site-packages/mxnet/libmxnet.so(+0x878f39) [0x7f14b3317f39] [bt] (2) /home/dev/anaconda3/envs/intelligent_platform/lib/python3.6/site-packages/mxnet/libmxnet.so(+0x8797f2) [0x7f14b33187f2] [bt] (3) /home/dev/anaconda3/envs/intelligent_platform/lib/python3.6/site-packages/mxnet/libmxnet.so(+0x9cacfc) [0x7f14b3469cfc] [bt] (4) /home/dev/anaconda3/envs/intelligent_platform/lib/python3.6/site-packages/mxnet/libmxnet.so(mxnet::imperative::SetShapeType(mxnet::Context const&, nnvm::NodeAttrs const&, std::vector<mxnet::NDArray, std::allocator<mxnet::NDArray> > const&, std::vector<mxnet::NDArray, std::allocator<mxnet::NDArray> > const&, mxnet::DispatchMode)+0x1d27) [0x7f14b6334437] [bt] (5) /home/dev/anaconda3/envs/intelligent_platform/lib/python3.6/site-packages/mxnet/libmxnet.so(mxnet::Imperative::Invoke(mxnet::Context const&, nnvm::NodeAttrs const&, std::vector<mxnet::NDArray, std::allocator<mxnet::NDArray> > const&, std::vector<mxnet::NDArray, std::allocator<mxnet::NDArray*> > const&)+0x1db) [0x7f14b633c9bb] [bt] (6) /home/dev/anaconda3/envs/intelligent_platform/lib/python3.6/site-packages/mxnet/libmxnet.so(+0x3759cef) [0x7f14b61f8cef] [bt] (7) /home/dev/anaconda3/envs/intelligent_platform/lib/python3.6/site-packages/mxnet/libmxnet.so(MXImperativeInvokeEx+0x62) [0x7f14b61f92b2] [bt] (8) /home/dev/anaconda3/envs/intelligent_platform/lib/python3.6/lib-dynload/../../libffi.so.6(ffi_call_unix64+0x4c) [0x7f14fca81ec0]

Can any expert could help on this issue?

Thanks in advance!

mobguang commented 4 years ago

Hi @nttstar , Could you please kindly help with it?

Thanks a lot!

nttstar commented 4 years ago

Sorry I haven't used triplet loss for a long time.

mobguang commented 4 years ago

Hi @nttstar , Thanks for your reply. Could you please provide the advice for how to fine tuning the models? Which train.py file and what parameters are necessary for fine tuning models?

Thanks!