Andrei-Aksionov commented 4 months ago

Hi there 👋

With BiasMap we can either set a bias for the whole model, or specify bias values for each module individually.

The logic is that if a module's bias is not provided (e.g., for projection), config.bias_map.projection will fall back to the main bias value.

Useful for #850

Andrei-Aksionov commented 4 months ago

Sure. We can close the PR and come back to it if there be a necessity.

Andrei-Aksionov commented 4 months ago

There's no need to make every bias configurable at the moment.

As I understand, there might be 5 biases:

main, applied to the whole model
attention
projection
mlp
lm_head

Right now the code supports biases main and lm_head (bias and lm_head_bias in the Config class).

850 most likely will add one more - attention.

So there will be 3 out of 5 in total. In my opinion, it's worth it to add such an abstraction already. But it's only my opinion.

carmocca commented 4 months ago

For now I'm going to close this, but if @lantiga agrees to this change then we'll reopen and land it

Andrei-Aksionov commented 2 months ago

The remaining issue is with the way config is provided. Now it's just a bool value: https://github.com/Lightning-AI/litgpt/blob/6d9816a965b3fdca0092b6e877cf2e0b922824a1/config_hub/pretrain/tinystories.yaml#L21 But apparently something in the CLI tools wants to see a dict. So if one provide a dict:

  bias_map:
    main: true
    attention: true
    projection: true
    mlp: true
    lm_head: false

it works fine, but kinda defeats the purpose. The solution might be to parse the input and handle it (plus handle the legacy name too). But when I am debugging, I don't understand what calls what. It's already past midnight and my head turned into a pumpking. I'll continue tomorrow.

rasbt commented 2 months ago

Thanks for trying. I was banging my head against that as well ... I don't understand jsonargparse well enough for that. Maybe @awaelchli or @carmocca might know more.

Andrei-Aksionov commented 2 months ago

So the reason for the fail was pretty simple: BiasMap expects a dict as an argument (since it's a dataclass), while the yaml file contained a boolean value.

A simple change from

bias: false

to

bias_map:
  main: false

did the trick.

The only test that keeps failing is where the config is provided via URL from the main branch, that contains an old bias: false value.

Should I add a compatibility code? If yes, then how to do it? Ideally to have **kwargs field that catches all the legacy arguments and process them in the __post__init__, but it looks like it's not possible. It's possible to do it in __init__ method, but that kinda defeats the purpose of the dataclass, isn't it? @awaelchli @carmocca

rasbt commented 2 months ago

Nice @Andrei-Aksionov. I think that setting is currently only used in the pretraining YAMLs, and I'd personally be fine with updating these even though it might break backward compatibility. We just rolled them out last week, so there's probably no userbase around it yet, and changing it now (vs later) is probably good timing.

The question is though if "main" is a good term. Will users know what it means and know how they can change the bias setting? I am actually in favor of a more verbose approach and having the options listed explicitly, e.g.,

attn_qkv_bias: true
attn_proj_bias: true
mlp_bias: true
lm_head_bias: false

or

bias_map:
  attn_qkv: true
  attn_proj: true
  mlp: true
  lm_head: false

What do you all think?

Andrei-Aksionov commented 2 months ago

Yeah, I struggled with properly naming it. Maybe all_modules instead of main. In this case we will have:

bias_map:
  all_modules: false

Will users know what it means and know how they can change the bias setting? I am actually in favor of a more verbose approach and having the options listed explicitly, e.g.

The whole purpose of BiasMap is that we have a "main switch" and that allows us to not specify a bias value for each module. So, if you prefer to be more explicit (which I actually support), then there is no reason why we should stick to BiasMap at all.

rasbt commented 2 months ago

The whole purpose of BiasMap is that we have a "main switch" and that allows us to not specify a bias value for each module. So, if you prefer to be more explicit (which I actually support), then there is no reason why we should stick to BiasMap at all.

Another thing we can do is to list all the options as comments in the YAML file.

Andrei-Aksionov commented 2 months ago

No, I mean, if you want to specify all the biases, it's fine and should work:

bias_map:
  attention: true
  projection: false
  mlp: false
  lm_head: true

All I am saying is that in this case the only benefit of using this class is that in configs we don't have to specify all the biases, e.g. instead of

{
    ....
    attention=True,
    projection=False,
    mlp=False,
    lm_head=True,
    ...
}

we have

{
    ...
    bias_map=BiasMap(False, attention=True, lm_head=True),
    ...
}

or if we disable all the biases (which is quite common) instead of

{
    ....
    attention=False,
    projection=False,
    mlp=False,
    lm_head=False,
    ...
}

we have

{
    ...
    bias_map=BiasMap(False),
    ...
}

The question is, does it justify this small code complication? I kinda doubt it. LitGPT is all about simplicity.

Bottom line is that I think we should close this PR (again 🙂) and go back to your PR.

rasbt commented 2 months ago

I like the BiasMap class & I think we can still use the BiasMap internally in config.py for finetuning etc. I just meant that we should probably explicitly list the options in the pretraining configs so that users know what the choices are.

Andrei-Aksionov commented 2 months ago

The more I think about it, the more I am turning against my own “creature”. I like the concept and we might reuse it somewhere else, but maybe not now.

In the yaml files we should explicitly specify all the possible biases so it's easier to see what can/needs to be changed. But the same goes to configs in litgpt.config. If someone wants to add a new config, that person needs to see what is BiasMap and how it works, needs to understand why in other configs there is just BiasMap(True) and so on.

>> import this
...
Explicit is better than implicit.
Simple is better than complex.
...

rasbt commented 2 months ago

I must say that I really really liked the BiasMap implementation because it was so small, elegant, and efficient. But yeah, from a user's perspective it may be a bit opaque and it'd be easier to see the options (esp in the config files) if there's a more verbose approach.

Should we revisit my alternative implementation in #1156?

Andrei-Aksionov commented 2 months ago

Lets goooo

Lightning-AI / litgpt

BiasMap: individual bias for each module #878

850 most likely will add one more - attention.