SHI-Labs / Neighborhood-Attention-Transformer

Neighborhood Attention Transformer, arxiv 2022 / CVPR 2023. Dilated Neighborhood Attention Transformer, arxiv 2022
MIT License
1.05k stars 86 forks source link

The testing results of the whole dataset is empty #31

Closed blowhen closed 2 years ago

blowhen commented 2 years ago

Integrating Na into mmdetection can run, but it keeps reporting errors,The testing results of the whole dataset is empty According to the solution of mmdetection, the learning rate is modified, and there is still no verification set result

blowhen commented 2 years ago

And the data set is normal

alihassanijr commented 2 years ago

Hello and thank you for your interest. This error typically occurs when there's a gradient explosion, so the model starts producing results that can't be correctly validated. It's usually not a dataset issue, but it could be an environment issue. Can you share your environment details (i.e. Python version, torch, torchvision, mmcv, mmdetection, ninja versions specifically)?

blowhen commented 2 years ago

python3.7 torch1.7.1 torchvision0.8.2 mmcv-full 1.5.0 mmdetection v2.24.1 ninja 1.10.2.3

I adjusted that warm up doesn't work, and the computer supports cuda11.2

blowhen commented 2 years ago

I mean, I'm using cuda11 0, this computer supports up to 11.2

alihassanijr commented 2 years ago

Have you tried using the recommended environment? Using the same versions specifically matters in reproducibility. Especially since we haven't verified that the kernel operates as expected in torch versions below 1.8.

blowhen commented 2 years ago

Running the base version on the a6000, the result of the verification set can be obtained, but occasionally there is no result, but running the mini and tiny versions has no result of the verification set

blowhen commented 2 years ago

Moreover, the operation of mini and tiny versions is a little strange, that is, there are verification set results in the first three rounds, and there are no results in the following rounds

alihassanijr commented 2 years ago

Again, the no results warning in mmcv just generally points to training collapsing. Based on the versions you shared I wouldn't be too surprised if those were the root of the issue, because different torch/mmcv versions tend to work differently, and we trained all of our models with torch 1.11.

blowhen commented 2 years ago

OK, I'll try another computer and I'll feed back the relevant results in real time. Thank you for your answer!

alihassanijr commented 2 years ago

You don't have to try another machine, you can simply set up a virtual environment and use the requriements.txt file provided to install the recommended versions of torch, mmcv and the like.

blowhen commented 2 years ago

I know what you mean. I've tried the recommended environment again, cuda11.3 torch1.11 mmcv-full 1.4.8 mmdetection v2.24.1 ninja 1.10.2.3 But the first three rounds have results, but the later ones still have no results. This problem has been bothered for two or three days. How can I solve it?

alihassanijr commented 2 years ago

Please note that this is still not the recommended environment, you're still on the wrong mmdet version. This is the correct setup:

torch==1.11.0+cu113
torchvision==0.12.0+cu113
mmcv-full==1.4.8
mmdet==2.19.0
ninja==1.10.2.3

It would also be more helpful if you could provide a log and the command you're trying to run if it occurs with these settings.

alihassanijr commented 2 years ago

I'm not sure why you're trying to build mmdet from scratch. All you need to do after setting up and activating the python environment is to do pip3 -r requirements.txt, and then run with the scripts provided. You don't have to build mmdet from scratch, and that is not recommended.

alihassanijr commented 2 years ago

Closing this due to inactivity. If you still have questions feel free to open it back up.