facebookresearch / dora

Dora is an experiment management framework. It expresses grid searches as pure python files as part of your repo. It identifies experiments with a unique hash signature. Scale up to hundreds of experiments without losing your sanity.
MIT License
262 stars 24 forks source link

Why only the log file of rank > 0 is created? #51

Closed mmarczuk closed 11 months ago

mmarczuk commented 11 months ago

❓ Questions

I'm trying to debug a problem with nccl, but I can only see the worker_1.log output file.

I have two gpus and I'm trying to see the logs, but I can only see the log of one, is this check really necessary?

TIA

adefossez commented 11 months ago

worker_1.log is actually the log file of the second worker. When running locally (eg with dora run), the output of the first worker is directly shown in the shell. It is not stored in any file.

mmarczuk commented 11 months ago

Got it, makes sense, thanks