Open rasbt opened 6 months ago
You should accompany any decision with a PoC of how to implement it. I say this because (to the best of my knowledge) a call like litgpt finetune <repo_id> --method "lora"
will be difficult to make it work with jsonargparse. If you dissect that call then it means that you have a function that is called from the finetune
subcommand of the litgpt
CLI:
def dispatch_finetune(
repo_id, # required
method="lora",
):
if method == "lora":
from litgpt.finetune.lora import main
main(repo_id)
elif ...
where based on the arguments you call a different function. Jsonargparse needs to understand this to pull out the arguments from the inner function (main
above) to expose them in the help message and configs. When I tried this in the past, I couldn't make it work.
So it might need to be replaced with an alternative tool like click
which is more flexible at creating complex arbitrary CLIs. With the tradeoff that you might lose support for automatic types from typing signatures, extra code complexity, repeated parsing configs in multiple places, a different config system...
It will depend strongly on the tool chosen for the job proposed.
See also my previous comment on this topic: https://github.com/Lightning-AI/litgpt/issues/996#issuecomment-1989618188
The limitation you mentioned would be for selectively showing the LoRA args, correct?
An alternative would be to show all finetune arguments (full, adapter, lora). I think users will know that the LoRA parameters only have an effect if they select --method lora
. This would of course not be so neat as the current version, but this would at least work in the meantime. (And we can maybe revisit other parsers some day; or wait for a jsonargparse version that might support it).
Switching to click could be an option longer term, but I think this would be a bigger lift.
On 2) Could we keep it pretraining from scratch by default? If not, then there would have to be a very loud warning IMO, and a way to opt out of auto loading a checkpoint. How would that look like?
To add to Carlos' comment, if a CLI rewrite is considered we would have to be super sure it can support all our use cases and requirements. There might also be an option to work closer with the jsonargparse author if we're blocked by missing features.
On 2) Could we keep it pretraining from scratch by default? If not, then there would have to be a very loud warning IMO, and a way to opt out of auto loading a checkpoint. How would that look like?
Personally, I have a slight preference to keep it pretraining from scratch, because it's also what most users would expect in my opinion.
The limitation you mentioned would be for selectively showing the LoRA args, correct?
Yes. But also for the --data argument or the --generate subcommand etc. These are technical details that are currently covered by jsonargparse automagically and an alternative might require you to do something very different. This is why I insist to create a PoC that mimics the script/args/config structure to see how difficult it becomes.
You could also completely change the args and config structure if you are doing breaking changes anyway
To summarize from our meeting this morning, an easier path forward might be to use
litgpt finetune_full
litgpt finetune_lora
litgpt finetune_adapter
litgpt finetune_adapter_v2
where we also keep litgpt finetune
as an alias for litgpt finetune_lora
.
To keep things simple for newcomers, we would only show litgpt finetune
in the main readme and then introduce the other ones
litgpt finetune_full
litgpt finetune_lora
litgpt finetune_adapter
litgpt finetune_adapter_v2
in the finetuning docs (plus the litgpt --help
description).
Following up on a longer internal discussion we had (cc @carmocca @lantiga @awaelchli ), we want to support the following user-friendly API in LitGPT:
So, in other words, we would make the
<repo_id>
a positional argument.1) Root dir
We probably should introduce a
--checkpoint_root_dir
defaulting to"checkpoints"
so thatdownloads to
checkpoints/<repo_id>
and
uses
checkpoints/<repo_id>
2) Pretrain behavior
In pretrain, if someone runs
the question is whether the
<repo_id>
takes the place of--model_name
or--initial_checkpoint_dir
. I don't have a strong opinion here, but pretraining from scratch would probably be the first thing someone thinks of when readingpretrain
(as opposed to continued pretraining).In this case we would use
meta-llama/Meta-Llama-3-8B-Instruct
as--model_name
(or extractMeta-Llama-3-8B-Instruct
as model name`).The counter argument is that using this positional argument to replace
initial_checkpoint_dir
would be more consistent withlitgpt download
andlitgpt finetune
etc.3) Removing subcommands
This means that we probably cannot support subcommands as in
litgpt finetune <repo_id>
andlora finetune lora <repo_id>
(Please correct me if I'm wrong.)
So, we would change it to
4) Deprecating
--repo_id
and--checkpoint_dir
Maybe a full deprecation would not be possible due to the subcommands issue above, but we should probably at least keep
--repo_id
and--checkpoint_dir
for now, and if they are not empty strings, we quit with a message explaining the API change (instead of a error stack trace).