No such file or directory

NVIDIA / semantic-segmentation

Nvidia Semantic Segmentation monorepo

BSD 3-Clause "New" or "Revised" License

1.77k stars 387 forks source link

Open YuNaruto opened 3 years ago

YuNaruto commented 3 years ago

when i run this: python -m runx.runx scripts/dump_folder.yml -i , go wrong . but the path is available. how this?

YuNaruto commented 3 years ago

@ajtao the path logs/dump_folder/hidden-puffin_2021.04.27_17.17/code is available

atharvas commented 3 years ago

Notice that we're cd-ing into the logs/ directory but the command executed, as defined in scripts/dump_folder.yml is:

python -m torch.distributed.launch --nproc_per_node=1 train.py

However, train.py is defined at the project directory level. Changing the relative path to an absolute path should fix this issue. That is:

CMD: "python -m torch.distributed.launch --nproc_per_node=1 {absolute path to train.py}"

In my case this was:

CMD: "python -m torch.distributed.launch --nproc_per_node=1 /home/ubuntu/semantic-segmentation/train.py"

ajtao commented 3 years ago

There's something wrong with (both) of your setups. What runx is attempting to do is:

Copy the codebase into a new 'run' directory under LOGROOT (defined in .runx). What @NarutoZhao's output showed is that this copy is failing. You should try to diagnose that problem. Try to define LOGROOT with an absolute path.
@atharvas Your comment is wrong. train.py is invoked after runx cd's into the run directory, where the code has just been copied. So you should not need to refer to the absolute path of train.py. In fact, the purpose of copying the code to the run directory is to serve as both (a) an archive of the state of the code and (b) to allow you to further modify your project repo after launching jobs.

msseibel commented 1 year ago