Open etgld opened 2 years ago
There might be a way to do this that's still worth trying but need to get a proof of concept working. Long story short the most high power method which was monotonically increasing I think was Population-Based Bandits.
The main issue around supporting Ray Tune I think involves UI. There are enough options for a Ray Tune run that including them in cnlp_args would create a lot of clutter I think. Maybe something better would involve some basic instructions/recipes plus a sample script.
Granted this has been a while but now that I have more facility with setting up these kinds of things on E2 I might be able to try it again.
I think there is also some cmd line parameters we inherit from HF that are supposed to automatically do tuning -- we should look into whether we can just use those as is.
For reference, there haven't been many updated examples for using Ray Tune with HuggingFace but this one looks helpful https://docs.ray.io/en/latest/train/examples/transformers/huggingface_text_classification.html Although it's sparse there are some up to date search space guidelines from HuggingFace for different backends https://huggingface.co/docs/transformers/en/hpo_train https://huggingface.co/docs/setfit/en/how_to/hyperparameter_optimization I remember from last year there was this nice paper about automatic gradient descent without hyperparameters, the caveat being that it doesn't presently work for transformers, I looked for the first author to see if there were any updates but nothing yet. https://arxiv.org/abs/2304.05187
There are some things we can definitely inherit from cnlp_args
although a lot of these backends have a lot of options and for some cases a config JSON might be the easiest way.
Docs https://docs.ray.io/en/master/tune/index.html
Core example https://github.com/ray-project/ray/blob/master/python/ray/tune/examples/pbt_transformers/pbt_transformers.py