ValueError: matrix contains invalid numeric entries

yellowwwo commented 1 year ago

Hi! I have a few questions to ask you and look forward to your answers. Because mmdetection has been updated to 2.25.0, and adamixer needs mmdetection2.12.0, so I did the following： 1、git clone adamixer-main 2、 cd adamixer-main 3、pip install -r requirements/build.txt and pip install -v -e . finally，I successfully install mmdetection2.12.0. （mmcv-full=1.3.3） I just modified the path of the dataset, and the results could be show when i experimenting with "adamixer_r50_1x_coco.py", but the error(ValueError: matrix contains invalid numeric entries) was shown when experimenting with “adamixer_r50_300_query_crop_mstrain_480-800_3x_coco.py”and others. there are no images in the dataset that do not contain any objects. Thank you for taking the time to read this question and look forward to your answer!

D:\conda3\envs\ada\lib\site-packages\mmcv\runner\hooks\optimizer.py:31: FutureWarning: Non-finite norm encountered in torch.nn.utils.clip_gradnorm; continuing anyway. Note that the default behavior will change in a future release to error out if a non-finite total norm is encountered. At that point, setting error_if_nonfinite=false will be required to retain the old behavior. return clip_grad.clip_gradnorm(params, self.grad_clip) Traceback (most recent call last): File "D:/code/AdaMixer-main/tools/train.py", line 203, in main() File "D:/code/AdaMixer-main/tools/train.py", line 199, in main meta=meta) File "d:\code\adamixer-main\mmdet\apis\train.py", line 170, in train_detector runner.run(data_loaders, cfg.workflow) File "D:\conda3\envs\ada\lib\site-packages\mmcv\runner\epoch_based_runner.py", line 125, in run epoch_runner(data_loaders[i], kwargs) File "D:\conda3\envs\ada\lib\site-packages\mmcv\runner\epoch_based_runner.py", line 50, in train self.run_iter(data_batch, train_mode=True, kwargs) File "D:\conda3\envs\ada\lib\site-packages\mmcv\runner\epoch_based_runner.py", line 30, in run_iter kwargs) File "D:\conda3\envs\ada\lib\site-packages\mmcv\parallel\data_parallel.py", line 67, in train_step return self.module.train_step(inputs[0], kwargs[0]) File "d:\code\adamixer-main\mmdet\models\detectors\base.py", line 233, in train_step losses = self(data) File "D:\conda3\envs\ada\lib\site-packages\torch\nn\modules\module.py", line 1051, in _call_impl return forward_call(input, *kwargs) File "D:\conda3\envs\ada\lib\site-packages\mmcv\runner\fp16_utils.py", line 95, in new_func return old_func(args, kwargs) File "d:\code\adamixer-main\mmdet\models\detectors\base.py", line 167, in forward return self.forward_train(img, img_metas, kwargs) File "d:\code\adamixer-main\mmdet\models\detectors\sparse_rcnn.py", line 63, in forward_train imgs_whwh=imgs_whwh) File "d:\code\adamixer-main\mmdet\models\roi_heads\adamixer_decoder.py", line 125, in forward_train gt_labels[i], img_metas[i]) File "d:\code\adamixer-main\mmdet\core\bbox\assigners\hungarian_assigner.py", line 132, in assign matched_row_inds, matched_col_inds = linear_sum_assignment(cost) File "D:\conda3\envs\ada\lib\site-packages\scipy\optimize_lsap.py", line 100, in linear_sum_assignment return _lsap_module.calculate_assignment(cost_matrix) ValueError: matrix contains invalid numeric entries

sebgao commented 1 year ago

I guess that the strong data augmentation in adamixer_r50_300_query_crop_mstrain_480-800_3x_coco.py may be buggy for fewer objects in an image. Just a guess……

yellowwwo commented 1 year ago

I guess that the strong data augmentation in adamixer_r50_300_query_crop_mstrain_480-800_3x_coco.py may be buggy for fewer objects in an image. Just a guess……

Thank you for your reply! I had tried experimenting with the dataset (visdrone) in another project of mmdetection2.25.0, which also used these data augmentation strategies, but the error (ValueError: matrix contains invalid number entries) did not occur.

sebgao commented 1 year ago

Maybe you can try to copy-and-paste newer data augmentation codes in mmdetection=2.25.0 to patch this codebase?

yellowwwo commented 1 year ago

Maybe you can try to copy-and-paste newer data augmentation codes in mmdetection=2.25.0 to patch this codebase?

Thanks for your help! I tried, but it didn't work. The error could have been caused by other reasons.

MCG-NJU / AdaMixer

ValueError: matrix contains invalid numeric entries #20