Open htlbayytq opened 1 year ago
@htlbayytq, I am trying to train Mask R-CNN on Jupyter Notebook with Ubuntu 20.04 and Python version 3.6.9. Unfortunately, it is not working, and I am encountering an error. Could you please provide me with the correct steps to train it? I would greatly appreciate your assistance. i got this error : For multi-GPU, change --gpus based on your machine.
2023-05-22 16:06:09,339 [INFO] root: Registry: ['nvcr.io']
2023-05-22 16:06:09,383 [INFO] tlt.components.instance_handler.local_instance: Running command in container: nvcr.io/nvidia/tao/tao-toolkit:4.0.1-tf1.15.5
Error response from daemon: No such container: 26c2e104d359450fa9fd3ee59027272b87bd0dc231014da0162cf129888a5e4f
2023-05-22 16:06:12,219 [INFO] tlt.components.docker_handler.docker_handler: Stopping container.
which docker container version you use it
@htlbayytq what dataset are you using
@imenselmi check whether you have GOOGLE_COLAB environment variable set for running on a colab environment
I spent a long time to figure out how to run Nvidia TAO Maskrcnn training. (nvidia-tao/maskrcnn.ipynb at main · NVIDIA-AI-IOT/nvidia-tao · GitHub)
And finally, the training is complete and “[INFO] Training finished successfully” is displayed. But Evaluation Metrics are all 0, and can not achieve right mask on inferences.
• Hardware : Running TAO Toolkit on Google Colab • Network Type : Mask_rcnn • Training spec file : maskrcnn_train_resnet50.txt
• How to reproduce the issue : train_log.txt Generate_tfrecords_log.txt enviroment_setting_log.txt
Please Help ! Plenty of thanks in advance !!!