databio / bedms

tool for standardization of genomics/epigenomics metadata
BSD 2-Clause "Simplified" License
3 stars 0 forks source link

Providing training infrastructure for users #24

Open saanikat opened 3 weeks ago

saanikat commented 3 weeks ago

For a user to be able to train their own datasets:

  1. Training scripts (user friendly) for all models.
  2. Additions to attr_standardizer to be able to fetch the users' models from HuggingFace.
  3. Documentation
nleroy917 commented 3 weeks ago

Maybe relevant: https://github.com/databio/geniml_dev/issues/166

ClaudeHu commented 3 weeks ago

trainer object for text2bed : genimlv.text2bed.vec2vec (it may be messy for a while since it is under update for alternative training methods, also I have plan to introduce lightning modules to text2bed according to suggestion from @nleroy917 )

reference code: music text representation, musiclm

nleroy917 commented 3 weeks ago

Also, wandb has been absolutely amazing for experiment tracking. I even started a databio-ml team: https://wandb.ai/databio-ml

databio was taken...

lightning is great because it gets rid of all headaches around GPU/slurm/DDP wandb is great because it gets rid of all headaches around tracking model progress

I also have a lot of fun stuff I've learned about training models with slurm and being able to kill things prematurely and gracefully.

saanikat commented 4 days ago

Likely solved with the new PR #25 Detailed documentation will be added to bedbase docs