AssertionError: output length 1 not a multiple of slice batch_size 128

C-SJK commented 3 years ago

When I was training, I encountered the following error。MxNet version is 1.6.0 This is my config info:

Called with argument: Namespace(batch_size=128, ckpt=3, ctx_num=1, dataset='emore', frequent=20, image_channel=3, kvstore='device', loss='softmax', lr=0.1, lr_steps='100000,160000,220000', models_root='./models', mom=0.9, network='m1', per_batch_size=128, pretrained='', pretrained_epoch=1, rescale_threshold=0, verbose=2000, wd=0.0005) {'bn_mom': 0.9, 'workspace': 256, 'emb_size': 256, 'ckpt_embedding': True, 'net_se': 0, 'net_act': 'prelu', 'net_unit': 3, 'net_input': 1, 'net_blocks': [1, 4, 6, 2], 'net_output': 'GDC', 'net_multiplier': 1.0, 'val_targets': ['lfw'], 'ce_loss': True, 'fc7_lr_mult': 1.0, 'fc7_wd_mult': 1.0, 'fc7_no_bias': False, 'max_steps': 0, 'data_rand_mirror': True, 'data_cutoff': False, 'data_color': 0, 'data_images_filter': 0, 'count_flops': True, 'memonger': False, 'loss_name': 'softmax', 'net_name': 'fmobilenet', 'dataset': 'emore', 'dataset_path': '../datasets/faces_emore', 'num_classes': 85742, 'image_shape': [112, 112, 3], 'loss': 'softmax', 'network': 'm1', 'num_workers': 1, 'batch_size': 128, 'per_batch_size': 128}

The error info: Traceback (most recent call last): File "train.py", line 483, in main() File "train.py", line 479, in main train_net(args) File "train.py", line 473, in train_net epoch_end_callback=epoch_cb) File "/opt/mxnet/python/mxnet/module/base_module.py", line 536, in fit self.update_metric(eval_metric, data_batch.label, False, data_batch.pad) File "/opt/mxnet/python/mxnet/module/module.py", line 777, in update_metric self._exec_group.update_metric(eval_metric, labels, pre_sliced, label_pads) File "/opt/mxnet/python/mxnet/module/executor_group.py", line 661, in update_metric out.shape[0], islice_batch_size) AssertionError: output length 1 not a multiple of slice batch_size 128

rafribeiro commented 3 years ago

I got the same error here. Running on a nvidia container (nvcr.io/nvidia/mxnet:20.12-py3), CUDA 11.1, RTX3090, MXNet 1.8.0 rc:

command CUDA_VISIBLE_DEVICES='0' python -u train.py --network r100 --loss arcface --dataset emore

config info / error messages Called with argument: Namespace(batch_size=128, ckpt=3, ctx_num=1, dataset='emore', frequent=20, image_channel=3, kvstore='device', loss='arcface', lr=0.1, lr_steps='100000,160000,220000', models_root='./models', mom=0.9, network='r100', per_batch_size=128, pretrained='', pretrained_epoch=1, rescale_threshold=0, verbose=2000, wd=0.0005) {'bn_mom': 0.9, 'workspace': 256, 'emb_size': 512, 'ckpt_embedding': True, 'net_se': 0, 'net_act': 'prelu', 'net_unit': 3, 'net_input': 1, 'net_blocks': [1, 4, 6, 2], 'net_output': 'E', 'net_multiplier': 1.0, 'val_targets': ['lfw', 'cfp_fp', 'agedb_30'], 'ce_loss': True, 'fc7_lr_mult': 1.0, 'fc7_wd_mult': 1.0, 'fc7_no_bias': False, 'max_steps': 0, 'data_rand_mirror': True, 'data_cutoff': False, 'data_color': 0, 'data_images_filter': 0, 'count_flops': True, 'memonger': False, 'loss_name': 'margin_softmax', 'loss_s': 64.0, 'loss_m1': 1.0, 'loss_m2': 0.5, 'loss_m3': 0.0, 'net_name': 'fresnet', 'num_layers': 100, 'dataset': 'emore', 'dataset_path': '../datasets/faces_emore', 'num_classes': 85742, 'image_shape': [112, 112, 3], 'loss': 'arcface', 'network': 'r100', 'num_workers': 1, 'batch_size': 128, 'per_batch_size': 128} 0 1 E 3 prelu False Network FLOPs: 24.2G

INFO:root:loading recordio ../datasets/faces_emore/train.rec... header0 label [5822654. 5908396.] id2range 85742 5822653 rand_mirror True [14:34:50] ../src/storage/storage.cc:199: Using Pooled (Naive) StorageManager for CPU loading bin 0 loading bin 1000 loading bin 2000 loading bin 3000 loading bin 4000 loading bin 5000 loading bin 6000 loading bin 7000 loading bin 8000 loading bin 9000 loading bin 10000 loading bin 11000 (12000, 3, 112, 112) ver lfw loading bin 0 loading bin 1000 loading bin 2000 loading bin 3000 loading bin 4000 loading bin 5000 loading bin 6000 loading bin 7000 loading bin 8000 loading bin 9000 loading bin 10000 loading bin 11000 loading bin 12000 loading bin 13000 (14000, 3, 112, 112) ver cfp_fp loading bin 0 loading bin 1000 loading bin 2000 loading bin 3000 loading bin 4000 loading bin 5000 loading bin 6000 loading bin 7000 loading bin 8000 loading bin 9000 loading bin 10000 loading bin 11000 (12000, 3, 112, 112) ver agedb_30 lr_steps [100000, 160000, 220000] call reset() [14:35:20] ../src/storage/storage.cc:199: Using Pooled (Naive) StorageManager for GPU /opt/mxnet/python/mxnet/module/base_module.py:504: UserWarning: Optimizer created manually outside Module but rescale_grad is not normalized to 1.0/batch_size/num_workers (1.0 vs. 0.0078125). Is this intended? self.init_optimizer(kvstore=kvstore, optimizer=optimizer, Traceback (most recent call last): File "train.py", line 483, in main() File "train.py", line 479, in main train_net(args) File "train.py", line 459, in train_net model.fit( File "/opt/mxnet/python/mxnet/module/base_module.py", line 536, in fit self.update_metric(eval_metric, data_batch.label, False, data_batch.pad) File "/opt/mxnet/python/mxnet/module/module.py", line 777, in update_metric self._exec_group.update_metric(eval_metric, labels, pre_sliced, label_pads) File "/opt/mxnet/python/mxnet/module/executor_group.py", line 667, in update_metric assert out.shape[0] % islice_batch_size == 0,\ AssertionError: output length 1 not a multiple of slice batch_size 128 [14:35:24] ../src/operator/nn/./cudnn/./cudnn_algoreg-inl.h:120: Running performance tests to find the best convolution algorithm, this can take a while... (set the environment variable MXNET_CUDNN_AUTOTUNE_DEFAULT to 0 to disable)

EDIT Got the same error with another version of nvidia docker image (20.08), with MXNet 1.6.0 and CUDA 11.0.

wpf19911118 commented 3 years ago

the same error

/opt/mxnet/python/mxnet/module/base_module.py:505: UserWarning: Optimizer created manually outside Module but rescale_grad is not normalized to 1.0/batch_size/num_workers (0.25 vs. 0.001953125). Is this intended? optimizer_params=optimizer_params) [12:45:41] ../src/operator/nn/./cudnn/./cudnn_algoreg-inl.h:120: Running performance tests to find the best convolution algorithm, this can take a while... (set the environment variable MXNET_CUDNN_AUTOTUNE_DEFAULT to 0 to disable) Traceback (most recent call last): File "train.py", line 484, in main() File "train.py", line 480, in main train_net(args) File "train.py", line 474, in train_net epoch_end_callback=epoch_cb) File "/opt/mxnet/python/mxnet/module/base_module.py", line 536, in fit self.update_metric(eval_metric, data_batch.label, False, data_batch.pad) File "/opt/mxnet/python/mxnet/module/module.py", line 777, in update_metric self._exec_group.update_metric(eval_metric, labels, pre_sliced, label_pads) File "/opt/mxnet/python/mxnet/module/executor_group.py", line 669, in update_metric out.shape[0], islice_batch_size) AssertionError: output length 1 not a multiple of slice batch_size 128

yhenon commented 3 years ago

Has anyone been able to resolve this?

deepinsight / insightface

AssertionError: output length 1 not a multiple of slice batch_size 128 #1366