Closed blowhen closed 2 years ago
And the data set is normal
Hello and thank you for your interest. This error typically occurs when there's a gradient explosion, so the model starts producing results that can't be correctly validated. It's usually not a dataset issue, but it could be an environment issue. Can you share your environment details (i.e. Python version, torch, torchvision, mmcv, mmdetection, ninja versions specifically)?
python3.7 torch1.7.1 torchvision0.8.2 mmcv-full 1.5.0 mmdetection v2.24.1 ninja 1.10.2.3
I adjusted that warm up doesn't work, and the computer supports cuda11.2
I mean, I'm using cuda11 0, this computer supports up to 11.2
Have you tried using the recommended environment? Using the same versions specifically matters in reproducibility. Especially since we haven't verified that the kernel operates as expected in torch versions below 1.8.
Running the base version on the a6000, the result of the verification set can be obtained, but occasionally there is no result, but running the mini and tiny versions has no result of the verification set
Moreover, the operation of mini and tiny versions is a little strange, that is, there are verification set results in the first three rounds, and there are no results in the following rounds
Again, the no results warning in mmcv just generally points to training collapsing. Based on the versions you shared I wouldn't be too surprised if those were the root of the issue, because different torch/mmcv versions tend to work differently, and we trained all of our models with torch 1.11.
OK, I'll try another computer and I'll feed back the relevant results in real time. Thank you for your answer!
You don't have to try another machine, you can simply set up a virtual environment and use the requriements.txt file provided to install the recommended versions of torch, mmcv and the like.
I know what you mean. I've tried the recommended environment again, cuda11.3 torch1.11 mmcv-full 1.4.8 mmdetection v2.24.1 ninja 1.10.2.3 But the first three rounds have results, but the later ones still have no results. This problem has been bothered for two or three days. How can I solve it?
Please note that this is still not the recommended environment, you're still on the wrong mmdet version. This is the correct setup:
torch==1.11.0+cu113
torchvision==0.12.0+cu113
mmcv-full==1.4.8
mmdet==2.19.0
ninja==1.10.2.3
It would also be more helpful if you could provide a log and the command you're trying to run if it occurs with these settings.
I'm not sure why you're trying to build mmdet from scratch. All you need to do after setting up and activating the python environment is to do pip3 -r requirements.txt
, and then run with the scripts provided. You don't have to build mmdet from scratch, and that is not recommended.
Closing this due to inactivity. If you still have questions feel free to open it back up.
Integrating Na into mmdetection can run, but it keeps reporting errors,The testing results of the whole dataset is empty According to the solution of mmdetection, the learning rate is modified, and there is still no verification set result