Add tool to easily assess model compatibility

gkumbhat commented 1 year ago

Description

As we are exploring support for more models, more tuning techniques, larger size models and multi-gpu vs single gpu with various context sizes, we often need to perform test to figure out if a model is compatible or not and under which configuration (single-gpu vs multi-gpu).

This story is to create a script that automates some of above exploration a bit and provides output that is easier to document (or automatically create markdown file).

Discussion

Provide detailed discussion here

Acceptance Criteria

[ ] Script that is able to run test as described above.
[ ] Add help in the tool that helps users run the script.
[ ] Update docs with information about this script

olson-ibm commented 1 year ago

The goal here is to predict whether or not .train() is going to complete successfully, given

Tuning parameters
A model (path to a model version)
A compute configuration (GPU, CPU, RAM, conda config etc)

I see two ways to accomplish this, so if I am off base feel free to advise:

The "Here's what worked in the past" strategy: We create some kind of publicly accessible data store that contains historic evidence (logs, etc) of successful tuning exercises (all of the above inputs) and what software packaging and compute configuration was used to perform the tuning. Maybe this get reported into a governance module or on a model card long term?
The "Let's give it a quick try" strategy: We add a command line switch to the fine and peft tuning kickoff scripts --bootstrap_only, that if present, exits after a successful .bootstrap() or errors out with a message why bootstrapping failed, including due to lack of resources (GPU, RAM, etc). This doesn't guarantee .train() will execute, however.

I don't see a way to predict whether .train() will complete successfully without actually calling it and waiting for it to fail.

I do have several bash shell scripts that does handle the overhead of setting up a training session (fetch the model, public or 'bring your own') sets up the output, etc. I do not know how much help that would be, but I could look into parameterize it further.

gkumbhat commented 1 year ago

@olson-ibm lets start with 2, i.e Let's give it a quick try approach. I was thinking we can expose this via a --compatibility-test option in currently script and do following:

bootstrap
train

With train, we can set 1 epoch, so that it doesn't keep going and not save the model at the end. Later on I image we can also add a dry-run functionality to .train function itself, which just tries to estimate if it will work or not without actually executing training 🤔

olson-ibm commented 1 year ago

which just tries to estimate if it will work or not without actually executing training 🤔

Can't wait to see what your thinking is here :)

PR on the rest of the above will be out shortly...

chakrn commented 1 year ago

Gaurav says to leverage the 'estimate' module for compatibility testing without doing actual training. @gkumbhat said he will create a new issue for this part.

chakrn commented 11 months ago

Moving this back to ToDo for now since Joe is working on a more pressing task in the internal repo

caikit / caikit-nlp