IDEA-Research / DINO

[ICLR 2023] Official implementation of the paper "DINO: DETR with Improved DeNoising Anchor Boxes for End-to-End Object Detection"
Apache License 2.0
2.1k stars 230 forks source link

How to check the progress of distributed run "bash scripts/DINO_train_submitit.sh /path/to/my/COCODIR" #189

Open shenw000 opened 1 year ago

shenw000 commented 1 year ago

I am using pytorch 1.11 on Ubuntu 20.04. The system configuration works fine with the command "bash scripts/DINO_train.sh /path/to/my/COCODIR". I have submitted a distributed run of "bash scripts/DINO_train_submitit.sh /path/to/my/COCODIR". The terminal (command line window) shows "Submitted job_id: 11007" and returns to system prompt. Nothing shows up in the terminal after that. Does that mean the distributed run is continous running or something went wrong? I checked the "experiments" folder and nothing is generated there either. As a result, I am asking for help to find a way to know if my training job is terminated or is its still progressing. If the training is progress, how much it has progressed, e.g. number of epochs completed, etc...

alpacaduby commented 1 year ago

How did you solve this problem? I also encountered this one,thanks.