Deci-AI / super-gradients

Easily train or fine-tune SOTA computer vision models with one open source training library. The home of Yolo-NAS.
https://www.supergradients.com
Apache License 2.0
4.59k stars 510 forks source link

Why is mAP very small or even 0 in the first round of training?? #1252

Closed qqqtwh closed 1 year ago

qqqtwh commented 1 year ago

💡 Your Question

When I train my dataset, the map value is very small or even 0 from the first epoch, and the following epoch is also 0!

epoch:==== 0 [2023-07-07 03:33:09] INFO - base_sg_logger.py - Checkpoint saved in checkpoints/debug_dataset/yolo_nas_s_640_640/ckpt_best.pth [2023-07-07 03:33:09] INFO - sg_trainer.py - Best checkpoint overriden: validation mAP@0.50: 2.8691483748843893e-06 epoch:==== 1 [2023-07-07 03:36:49] INFO - base_sg_logger.py - Checkpoint saved in checkpoints/debug_dataset/yolo_nas_s_640_640/ckpt_best.pth [2023-07-07 03:36:49] INFO - sg_trainer.py - Best checkpoint overriden: validation mAP@0.50: 4.2316451072110794e-06

Versions

docker image: pytorch/pytorch:1.12.1-cuda11.3-cudnn8-devel

My package details are as follows: absl-py 1.4.0 alabaster 0.7.13 antlr4-python3-runtime 4.9.3 attrs 23.1.0 Babel 2.12.1 backcall 0.2.0 beautifulsoup4 4.11.1 boto3 1.28.0 botocore 1.31.0 brotlipy 0.7.0 build 0.10.0 cachetools 5.3.1 certifi 2022.6.15 cffi 1.15.0 chardet 4.0.0 charset-normalizer 2.0.4 click 8.1.4 colorama 0.4.4 coloredlogs 15.0.1 conda 4.13.0 conda-build 3.21.9 conda-content-trust 0+unknown conda-package-handling 1.8.1 coverage 5.3.1 cryptography 37.0.1 cycler 0.11.0 decorator 5.1.1 Deprecated 1.2.14 docutils 0.17.1 einops 0.3.2 filelock 3.6.0 flatbuffers 23.5.26 fonttools 4.38.0 future 0.18.3 glob2 0.7 google-auth 2.21.0 google-auth-oauthlib 0.4.6 grpcio 1.56.0 humanfriendly 10.0 hydra-core 1.3.2 idna 3.3 imagesize 1.4.1 importlib-metadata 6.7.0 importlib-resources 5.12.0 ipython 7.31.1 jedi 0.18.1 Jinja2 2.10.1 jmespath 1.0.1 json-tricks 3.16.1 jsonschema 4.17.3 kiwisolver 1.4.4 libarchive-c 2.9 Markdown 3.4.3 markdown-it-py 2.2.0 MarkupSafe 2.1.3 matplotlib 3.5.3 matplotlib-inline 0.1.2 mdurl 0.1.2 mkl-fft 1.3.1 mkl-random 1.2.2 mkl-service 2.4.0 mpmath 1.3.0 numpy 1.21.6 oauthlib 3.2.2 omegaconf 2.3.0 onnx 1.13.0 onnx-simplifier 0.4.33 onnxruntime 1.13.1 opencv-python 4.8.0.74 opencv-python-headless 4.8.0.74 packaging 23.1 parso 0.8.3 pexpect 4.8.0 pickleshare 0.7.5 Pillow 9.0.1 pip 23.1.2 pip-tools 6.14.0 pkginfo 1.8.2 pkgutil_resolve_name 1.3.10 prompt-toolkit 3.0.20 protobuf 3.20.3 psutil 5.8.0 ptyprocess 0.7.0 pyasn1 0.5.0 pyasn1-modules 0.3.0 pycocotools 2.0.6 pycosat 0.6.3 pycparser 2.21 pyDeprecate 0.3.2 Pygments 2.15.1 pyOpenSSL 22.0.0 pyparsing 2.4.5 pyproject_hooks 1.0.0 pyrsistent 0.19.3 PySocks 1.7.1 python-dateutil 2.8.2 pytz 2022.1 PyYAML 6.0 rapidfuzz 3.1.1 requests 2.27.1 requests-oauthlib 1.3.1 rich 13.4.2 rsa 4.9 ruamel-yaml-conda 0.15.100 s3transfer 0.6.1 scipy 1.7.3 setuptools 61.2.0 six 1.16.0 snowballstemmer 2.2.0 soupsieve 2.3.1 Sphinx 4.0.3 sphinx-rtd-theme 1.2.2 sphinxcontrib-applehelp 1.0.2 sphinxcontrib-devhelp 1.0.2 sphinxcontrib-htmlhelp 2.0.0 sphinxcontrib-jquery 4.1 sphinxcontrib-jsmath 1.0.1 sphinxcontrib-qthelp 1.0.3 sphinxcontrib-serializinghtml 1.1.5 stringcase 1.2.0 super-gradients 3.1.2 sympy 1.10.1 tensorboard 2.11.2 tensorboard-data-server 0.6.1 tensorboard-plugin-wit 1.8.1 termcolor 1.1.0 tomli 2.0.1 torch 1.12.1 torchmetrics 0.8.0 torchtext 0.13.1 torchvision 0.13.1 tqdm 4.63.0 traitlets 5.1.1 treelib 1.6.1 typing_extensions 4.3.0 urllib3 1.26.8 wcwidth 0.2.5 Werkzeug 2.2.3 wheel 0.40.0 wrapt 1.15.0 zipp 3.15.0

BloodAxe commented 1 year ago

Could be anything from a bug in your code, a hard dataset, sub-optimal hyperparametrs, etc. When a space rocket takes off, it also don't moving fast in a few seconds, right? Don't know why you consider this an issue. If the mAP is around zero throughout whole training - this is indeed a problem.

But it is impossible to guess with the limited information you've provided. Any specific details about the dataset and training recipe that you can provide?

PS: you can try running the DataGradients on your dataset to get valuable insights about distribution of boxes and potential issues. Please note this tool is in early beta.