facebookresearch / vissl

VISSL is FAIR's library of extensible, modular and scalable components for SOTA Self-Supervised Learning with images.
https://vissl.ai
MIT License
3.25k stars 331 forks source link

WandB support #268

Open pclucas14 opened 3 years ago

pclucas14 commented 3 years ago

🚀 Feature

Now that we can use Weights and Biases on the Facebook cluster, it would be really neat if there was support for it within VISSL.

Motivation & Examples

WandB is like TensorBoard on steroids, and provides a more user-friendly interface.

Tell us why the feature is useful.

Should be used exactly like the current Tensorboard logger within VISSL

Note

I this is something the VISSL team would consider adding, I can try and submit a PR :)

prigoyal commented 3 years ago

thank you @pclucas14 , this is a nice feature to add to VISSL. Please go ahead and submit a PR :) also you can grab this issue (update the Assignees) if you are moving forward :)

surajpaib commented 2 years ago

Hi @pclucas14 @prigoyal ,

It would be great to have W&B support as I'm very keen on using it to track my experiments. I see that this issue has been stale for a while, are there any updates on the status of this integration?

Thanks!

pclucas14 commented 2 years ago

Hi @surajpaib,

No I don't think anyone is actively working on this :/ The issue back then was how to properly log with DDP iirc. You can probably check how pytorch-lightning does it and implement something similar.

Best of luck!

surajpaib commented 2 years ago

Thanks for letting me know! I was able to setup wandb to mimic what the tensorboard hook does. @prigoyal Would you still be interested in a PR for this?

I've handled the DDP as mentioned on the wandb logs by logging solely from the primary rank. Source: https://docs.wandb.ai/guides/track/advanced/distributed-training

You can find the implementation on my fork: https://github.com/surajpaib/vissl/commit/022fade7ce7db063e4b170834e8fe5f59f832729

surajpaib commented 2 years ago

271

Okay, so I just found this PR. On going through it, I see that the only difference that makes it work is that I run wandb.init() call after DDP is initialized (https://github.com/surajpaib/vissl/blob/022fade7ce7db063e4b170834e8fe5f59f832729/vissl/hooks/wandb_hook.py#L55)

If I follow correctly, the consensus on the PR was to add a new is_primary definition separate from the ClassyVision one. This should circumvent the need for that