Closed Andrei-Aksionov closed 2 months ago
Sure. We can close the PR and come back to it if there be a necessity.
There's no need to make every bias configurable at the moment.
As I understand, there might be 5 biases:
Right now the code supports biases main
and lm_head
(bias
and lm_head_bias
in the Config class).
So there will be 3 out of 5 in total. In my opinion, it's worth it to add such an abstraction already. But it's only my opinion.
For now I'm going to close this, but if @lantiga agrees to this change then we'll reopen and land it
The remaining issue is with the way config is provided. Now it's just a bool value: https://github.com/Lightning-AI/litgpt/blob/6d9816a965b3fdca0092b6e877cf2e0b922824a1/config_hub/pretrain/tinystories.yaml#L21 But apparently something in the CLI tools wants to see a dict. So if one provide a dict:
bias_map:
main: true
attention: true
projection: true
mlp: true
lm_head: false
it works fine, but kinda defeats the purpose. The solution might be to parse the input and handle it (plus handle the legacy name too). But when I am debugging, I don't understand what calls what. It's already past midnight and my head turned into a pumpking. I'll continue tomorrow.
Thanks for trying. I was banging my head against that as well ... I don't understand jsonargparse well enough for that. Maybe @awaelchli or @carmocca might know more.
So the reason for the fail was pretty simple: BiasMap expects a dict as an argument (since it's a dataclass), while the yaml file contained a boolean value.
A simple change from
bias: false
to
bias_map:
main: false
did the trick.
The only test that keeps failing is where the config is provided via URL from the main branch, that contains an old bias: false
value.
Should I add a compatibility code? If yes, then how to do it?
Ideally to have **kwargs
field that catches all the legacy arguments and process them in the __post__init__
, but it looks like it's not possible.
It's possible to do it in __init__
method, but that kinda defeats the purpose of the dataclass, isn't it?
@awaelchli @carmocca
Nice @Andrei-Aksionov. I think that setting is currently only used in the pretraining YAMLs, and I'd personally be fine with updating these even though it might break backward compatibility. We just rolled them out last week, so there's probably no userbase around it yet, and changing it now (vs later) is probably good timing.
The question is though if "main" is a good term. Will users know what it means and know how they can change the bias setting? I am actually in favor of a more verbose approach and having the options listed explicitly, e.g.,
attn_qkv_bias: true
attn_proj_bias: true
mlp_bias: true
lm_head_bias: false
or
bias_map:
attn_qkv: true
attn_proj: true
mlp: true
lm_head: false
What do you all think?
Yeah, I struggled with properly naming it.
Maybe all_modules
instead of main
.
In this case we will have:
bias_map:
all_modules: false
Will users know what it means and know how they can change the bias setting? I am actually in favor of a more verbose approach and having the options listed explicitly, e.g.
The whole purpose of BiasMap
is that we have a "main switch" and that allows us to not specify a bias value for each module.
So, if you prefer to be more explicit (which I actually support), then there is no reason why we should stick to BiasMap
at all.
The whole purpose of BiasMap is that we have a "main switch" and that allows us to not specify a bias value for each module. So, if you prefer to be more explicit (which I actually support), then there is no reason why we should stick to BiasMap at all.
Another thing we can do is to list all the options as comments in the YAML file.
No, I mean, if you want to specify all the biases, it's fine and should work:
bias_map:
attention: true
projection: false
mlp: false
lm_head: true
All I am saying is that in this case the only benefit of using this class is that in configs we don't have to specify all the biases, e.g. instead of
{
....
attention=True,
projection=False,
mlp=False,
lm_head=True,
...
}
we have
{
...
bias_map=BiasMap(False, attention=True, lm_head=True),
...
}
or if we disable all the biases (which is quite common) instead of
{
....
attention=False,
projection=False,
mlp=False,
lm_head=False,
...
}
we have
{
...
bias_map=BiasMap(False),
...
}
The question is, does it justify this small code complication? I kinda doubt it. LitGPT is all about simplicity.
Bottom line is that I think we should close this PR (again ๐) and go back to your PR.
I like the BiasMap
class & I think we can still use the BiasMap
internally in config.py
for finetuning etc. I just meant that we should probably explicitly list the options in the pretraining configs so that users know what the choices are.
The more I think about it, the more I am turning against my own โcreatureโ. I like the concept and we might reuse it somewhere else, but maybe not now.
In the yaml files we should explicitly specify all the possible biases so it's easier to see what can/needs to be changed.
But the same goes to configs in litgpt.config
. If someone wants to add a new config, that person needs to see what is BiasMap
and how it works, needs to understand why in other configs there is just BiasMap(True)
and so on.
>> import this
...
Explicit is better than implicit.
Simple is better than complex.
...
I must say that I really really liked the BiasMap implementation because it was so small, elegant, and efficient. But yeah, from a user's perspective it may be a bit opaque and it'd be easier to see the options (esp in the config files) if there's a more verbose approach.
Should we revisit my alternative implementation in #1156?
Lets goooo
Hi there ๐
With
BiasMap
we can either set a bias for the whole model, or specify bias values for each module individually.The logic is that if a module's bias is not provided (e.g., for projection),
config.bias_map.projection
will fall back to the main bias value.Useful for #850