Closed bernstei closed 3 months ago
I see now that the logger depends on rank
, which isn't known until after the distributed env is created, and I'm not sure what's the simplest way of dealing with that. Maybe just setting the logging level very early, and setting the full logger later. But I still think the DistributedEnvironment
exception message should be an error, no?
I think the logger need to be set up before the calls that create
DistributedEnvironment
incli/run_train.py
, otherwise there's no output. Also, since it's an error, it should probably go tologger.error
rather than.info
, and return a non-zero status when the script terminates.