Open nish03 opened 2 months ago
juwels-booster compute nodes don't have internet access; runs will fail in online mode. There's probably a way to tunnel out of the compute nodes (according to Stefan Kesselheim) but I never got around to testing and automating it. @grassesi, maybe Michael or you or Stefan's group can look into this at some point.
I dont know, as far as I know there is no way to open outgoing ssh connections directly from the HPC (neither compute nor login nodes) for security concerns.
Can we close the issue and add a reference to this discussion in the Wiki so if people wonder they can see this thread?
According to Stefan Kesselheim, there is a way to tunnel out of the compute nodes to have wandb run in online mode. It's not portable but if most of development is happening at JSC then it might still be worth deploying.
@grassesi : maybe you can talk to Stefan Kesselheim?
Renamed the issues and leaving it open.
Any particular reason why Wandb is currently in offline mode during training? Is it related to AtmoRep facing potential syncing issues with Wandb server during training?