Open gkumbhat opened 1 year ago
The goal here is to predict whether or not .train() is going to complete successfully, given
I see two ways to accomplish this, so if I am off base feel free to advise:
The "Here's what worked in the past" strategy: We create some kind of publicly accessible data store that contains historic evidence (logs, etc) of successful tuning exercises (all of the above inputs) and what software packaging and compute configuration was used to perform the tuning. Maybe this get reported into a governance module or on a model card long term?
The "Let's give it a quick try" strategy:
We add a command line switch to the fine and peft tuning kickoff scripts --bootstrap_only
, that if present, exits after a successful .bootstrap()
or errors out with a message why bootstrapping failed, including due to lack of resources (GPU, RAM, etc). This doesn't guarantee .train()
will execute, however.
I don't see a way to predict whether .train()
will complete successfully without actually calling it and waiting for it to fail.
I do have several bash shell scripts that does handle the overhead of setting up a training session (fetch the model, public or 'bring your own') sets up the output, etc. I do not know how much help that would be, but I could look into parameterize it further.
@olson-ibm lets start with 2, i.e Let's give it a quick try
approach. I was thinking we can expose this via a --compatibility-test
option in currently script and do following:
bootstrap
train
With train, we can set 1 epoch, so that it doesn't keep going and not save the model at the end. Later on I image we can also add a dry-run
functionality to .train
function itself, which just tries to estimate if it will work or not without actually executing training 🤔
which just tries to estimate if it will work or not without actually executing training 🤔
Can't wait to see what your thinking is here :)
PR on the rest of the above will be out shortly...
Gaurav says to leverage the 'estimate' module for compatibility testing without doing actual training. @gkumbhat said he will create a new issue for this part.
Moving this back to ToDo for now since Joe is working on a more pressing task in the internal repo
Description
As we are exploring support for more models, more tuning techniques, larger size models and multi-gpu vs single gpu with various context sizes, we often need to perform test to figure out if a model is compatible or not and under which configuration (single-gpu vs multi-gpu).
This story is to create a script that automates some of above exploration a bit and provides output that is easier to document (or automatically create markdown file).
Discussion
Provide detailed discussion here
Acceptance Criteria