Streamline LitGPT API - Githubissues

rasbt commented 6 months ago

Following up on a longer internal discussion we had (cc @carmocca @lantiga @awaelchli ), we want to support the following user-friendly API in LitGPT:

# ligpt [action] [model]
litgpt  download  meta-llama/Meta-Llama-3-8B-Instruct
litgpt  chat      meta-llama/Meta-Llama-3-8B-Instruct
litgpt  finetune  meta-llama/Meta-Llama-3-8B-Instruct
litgpt  pretrain  meta-llama/Meta-Llama-3-8B-Instruct
litgpt  serve     meta-llama/Meta-Llama-3-8B-Instruct

So, in other words, we would make the <repo_id> a positional argument.

1) Root dir

We probably should introduce a --checkpoint_root_dir defaulting to "checkpoints" so that

litgpt download <repo_id>

downloads to checkpoints/<repo_id>

and

litgpt finetune <repo_id>

uses checkpoints/<repo_id>

2) Pretrain behavior

In pretrain, if someone runs

litgpt pretrain <repo_id>

the question is whether the <repo_id> takes the place of --model_name or --initial_checkpoint_dir. I don't have a strong opinion here, but pretraining from scratch would probably be the first thing someone thinks of when reading pretrain (as opposed to continued pretraining).

In this case we would use meta-llama/Meta-Llama-3-8B-Instruct as --model_name (or extract Meta-Llama-3-8B-Instruct as model name`).

The counter argument is that using this positional argument to replace initial_checkpoint_dir would be more consistent with litgpt download and litgpt finetune etc.

3) Removing subcommands

This means that we probably cannot support subcommands as in

litgpt finetune <repo_id> and lora finetune lora <repo_id>

(Please correct me if I'm wrong.)

So, we would change it to

lora finetune <repo_id> --method "lora"

4) Deprecating `--repo_id` and `--checkpoint_dir`

Maybe a full deprecation would not be possible due to the subcommands issue above, but we should probably at least keep --repo_id and --checkpoint_dir for now, and if they are not empty strings, we quit with a message explaining the API change (instead of a error stack trace).

carmocca commented 6 months ago

You should accompany any decision with a PoC of how to implement it. I say this because (to the best of my knowledge) a call like litgpt finetune <repo_id> --method "lora" will be difficult to make it work with jsonargparse. If you dissect that call then it means that you have a function that is called from the finetune subcommand of the litgpt CLI:

def dispatch_finetune(
    repo_id,  # required
    method="lora",
):
    if method == "lora":
        from litgpt.finetune.lora import main

        main(repo_id)
    elif ...

where based on the arguments you call a different function. Jsonargparse needs to understand this to pull out the arguments from the inner function (main above) to expose them in the help message and configs. When I tried this in the past, I couldn't make it work.

So it might need to be replaced with an alternative tool like click which is more flexible at creating complex arbitrary CLIs. With the tradeoff that you might lose support for automatic types from typing signatures, extra code complexity, repeated parsing configs in multiple places, a different config system...

It will depend strongly on the tool chosen for the job proposed.

carmocca commented 6 months ago

See also my previous comment on this topic: https://github.com/Lightning-AI/litgpt/issues/996#issuecomment-1989618188

rasbt commented 6 months ago

The limitation you mentioned would be for selectively showing the LoRA args, correct?

An alternative would be to show all finetune arguments (full, adapter, lora). I think users will know that the LoRA parameters only have an effect if they select --method lora. This would of course not be so neat as the current version, but this would at least work in the meantime. (And we can maybe revisit other parsers some day; or wait for a jsonargparse version that might support it).

Switching to click could be an option longer term, but I think this would be a bigger lift.

awaelchli commented 6 months ago

On 2) Could we keep it pretraining from scratch by default? If not, then there would have to be a very loud warning IMO, and a way to opt out of auto loading a checkpoint. How would that look like?

To add to Carlos' comment, if a CLI rewrite is considered we would have to be super sure it can support all our use cases and requirements. There might also be an option to work closer with the jsonargparse author if we're blocked by missing features.

rasbt commented 6 months ago

On 2) Could we keep it pretraining from scratch by default? If not, then there would have to be a very loud warning IMO, and a way to opt out of auto loading a checkpoint. How would that look like?

Personally, I have a slight preference to keep it pretraining from scratch, because it's also what most users would expect in my opinion.

carmocca commented 6 months ago

The limitation you mentioned would be for selectively showing the LoRA args, correct?

Yes. But also for the --data argument or the --generate subcommand etc. These are technical details that are currently covered by jsonargparse automagically and an alternative might require you to do something very different. This is why I insist to create a PoC that mimics the script/args/config structure to see how difficult it becomes.

You could also completely change the args and config structure if you are doing breaking changes anyway

rasbt commented 6 months ago

To summarize from our meeting this morning, an easier path forward might be to use

litgpt finetune_full
litgpt finetune_lora
litgpt finetune_adapter
litgpt finetune_adapter_v2

where we also keep litgpt finetune as an alias for litgpt finetune_lora.

To keep things simple for newcomers, we would only show litgpt finetune in the main readme and then introduce the other ones

litgpt finetune_full
litgpt finetune_lora
litgpt finetune_adapter
litgpt finetune_adapter_v2

in the finetuning docs (plus the litgpt --help description).

Lightning-AI / litgpt

Streamline LitGPT API #1403

1) Root dir

2) Pretrain behavior

3) Removing subcommands

4) Deprecating `--repo_id` and `--checkpoint_dir`

Lightning-AI / litgpt

Streamline LitGPT API #1403

1) Root dir

2) Pretrain behavior

3) Removing subcommands

4) Deprecating --repo_id and --checkpoint_dir

4) Deprecating `--repo_id` and `--checkpoint_dir`