NVIDIA-AI-IOT / nvidia-tao

Other
82 stars 11 forks source link

Nvidia TAO Maskrcnn training problem. Training is complete but Evaluation Metrics are all 0, and can not achieve right mask on inferences #3

Open htlbayytq opened 1 year ago

htlbayytq commented 1 year ago

I spent a long time to figure out how to run Nvidia TAO Maskrcnn training. (nvidia-tao/maskrcnn.ipynb at main · NVIDIA-AI-IOT/nvidia-tao · GitHub)

And finally, the training is complete and “[INFO] Training finished successfully” is displayed. But Evaluation Metrics are all 0, and can not achieve right mask on inferences.

• Hardware : Running TAO Toolkit on Google Colab • Network Type : Mask_rcnn • Training spec file : maskrcnn_train_resnet50.txt

• How to reproduce the issue : train_log.txt Generate_tfrecords_log.txt enviroment_setting_log.txt

Please Help ! Plenty of thanks in advance !!!

imenselmi commented 1 year ago

@htlbayytq, I am trying to train Mask R-CNN on Jupyter Notebook with Ubuntu 20.04 and Python version 3.6.9. Unfortunately, it is not working, and I am encountering an error. Could you please provide me with the correct steps to train it? I would greatly appreciate your assistance. i got this error : For multi-GPU, change --gpus based on your machine. 2023-05-22 16:06:09,339 [INFO] root: Registry: ['nvcr.io'] 2023-05-22 16:06:09,383 [INFO] tlt.components.instance_handler.local_instance: Running command in container: nvcr.io/nvidia/tao/tao-toolkit:4.0.1-tf1.15.5 Error response from daemon: No such container: 26c2e104d359450fa9fd3ee59027272b87bd0dc231014da0162cf129888a5e4f 2023-05-22 16:06:12,219 [INFO] tlt.components.docker_handler.docker_handler: Stopping container.
which docker container version you use it

ramanathan831 commented 1 year ago

@htlbayytq what dataset are you using

ramanathan831 commented 1 year ago

@imenselmi check whether you have GOOGLE_COLAB environment variable set for running on a colab environment