Closed carmocca closed 1 year ago
@carmocca during the first accelerator refactor we have introduced the term mixed
(before it was 16 always) because we were actually planning to deprecate the alias of 16 for AMP and have real half precision training (like we have for double precision) with that flag instead.
What's the status of this discussion? I wanted to train a BERT-like model with half precision, where the model is also half precision and not only the inputs. What would be the correct way to train not using mixed precision? Thanks!
That's not yet decided.
For a true half precision training the current way to go would be to use a custom PrecisionPlugin
similar to what we do for double precision
This issue has been automatically marked as stale because it hasn't had any recent activity. This issue will be closed in 7 days if no further activity occurs. Thank you for your contributions, Pytorch Lightning Team!
This issue has been automatically marked as stale because it hasn't had any recent activity. This issue will be closed in 7 days if no further activity occurs. Thank you for your contributions, Pytorch Lightning Team!
A user in Slack got confused by this
How do I do mixed precision training (CPU / GPU)? [...] Allowed precision values: ('16', '32', '64', 'bf16', 'mixed') suggest that 16 trains pure 16 -bit and mixed trains amp . However, the documentation states that 16 trains with mixed precision. When calling Trainer(precision='mixed') I get an error message No precision set . So how do I do pure half-precision, mixed and full-precision training?
Do we want to support true half-precision? I think we should.
The problem is the API. Since precision=16
is used for mixed 16bit, we would need a deprecation phase so that precision=16
becomes true half-precision and precision=mixed_16
for mixed half-precision.
The disadvantage of this is that the instinct from users will be to pass precision=16
because many people don't understand the difference between true precision and mixed precision. On the other hand, having precision=16
be mixed half-precision and requiring precision='true_16'
would be inconsistent with the behavior of precision=16
.
One more option is to add one extra flag precision_type="full"|"mixed"
where we remove precision="mixed"
. We automatically choose the best type given the precision value.
Thanks, @carmocca I think this is a great issue to bring up.
I too was confused by the naming convention originally and required me to go through Pytorch Lightning to understand what the flag means.
I believe that changing precision=16
from being AMP is very dangerous, and could lead to many angry users that do not heed the deprecation warning (unless it was incredible loud :P). I love the idea of adding a precision_type
of some sort to control whether we'd assume mixed or full. Maybe even just mixed_precision=True | False
.
My suggestion for the API would be to have arguments
amp_backend
as one argument ("apex" vs "native")precision
, which is either [00, 01, 02, 03] for apex or [16, mixed, 32, 64] for native.In terms of naming for the native backend, I would suggest
Full precision: "full"
and 32
Mixed precision: "mixed"
Half precision: "half"
and 16
Double precision: "double"
and 64
Bfloat precision: "bf16"
or "bfloat16"
If there wont be half precision, 16 should be deprecated because it is a misnomer.
I believe that changing precision=16 from being AMP is very dangerous, and could lead to many angry users that do not heed the deprecation warning (unless it was incredible loud :P). I love the idea of adding a precision_type of some sort to control whether we'd assume mixed or full. Maybe even just mixed_precision=True | False.
I agree with @SeanNaren. Even though users will need to remember to set two flags (precision
and mixed_precision
) for doing one thing (true16, ...), I don't think it's worth changing it from being amp, just for a backward-compatibility reason as SeanNaren mentioned above.
I also like @carmocca's suggestion for adding a flag mixed_precision=True | False
even though that's adding one more thing to remember for users. I think it may be fine if we document it properly so that the new flag is discoverable.
also like @carmocca's suggestion for adding a flag mixed_precision=True | False even though that's adding one more thing to remember for users. I think it may be fine if we document it properly so that the new flag is discoverable.
with another flag, we will end up with 4 flags that determine precision for different configurations.
off-topic: I was thinking if we could deprecate amp_level
btw and let users choose a different amp_level
by passing Trainer(..., plugins=[ApexMixedPrecisionPlugin(amp_level=...)])
.
My suggestion for the API would be to have arguments...
@wisecornelius I think that is very correct and concise but prone to errors from inexperienced users who don't understand all the nuances and differences between backends.
Maybe even just mixed_precision=True | False.
This works now, I just wonder if there'll ever be a new precision algorithm. We would want to deprecate this bool if that ever happens.
If we go with the bool, it should be amp_enabled=True|False
for consistency
So we would have precision
, amp_backend
, amp_level
, precision_type
/amp_enabled
What I don't really like is that the product of options is very sparse. It does feel boilerplate-y to me but there's no better solution without going through deprecations.
off-topic: I was thinking if we could deprecate amp_level...
This makes sense to me, but you should open a new issue for that.
What I don't really like is that the product of options is very sparse. It does feel boilerplate-y to me but there's no better solution without going through deprecations.
@carmocca I agree. I do not like the sparsity. It seems that the strategy repository is currently a project to make the strategy selection using dense arguments. I just learned that amp_level=01
only works when precision=16
. I did that wrong for 6 months...
I feel like having four different options controlling the precision AND, as @carmocca pointed out, the product of these to be very sparse is not a good approach.
How about having "only" three of them and folding the precision_type
/amp_enabled
suggested in the last comment into amp_level
, where None
would indicate something along the lines of amp_enabled=False
and anything other would indicate amp_enabled=True
?
Looking at the history, newly added trainer arguments usually don't survive that long 😅
I believe we should take in the learnings from the accelerator/strategy/plugin syntax, where we introduced the concept of registries for combinations of selections under one name.
This helped us push trainer arguments to the individual plugins while still enabling an easy to use interface in code and CLI. Since amp_level
and amp_backend
only apply to a subset of available plugins, these could be moved to the individual implementations and registered as names for the precision argument (introducing a precision registry).
This solves the sparsity problem as we would only expose the most common combinations as names and makes it easier to discover and understand the Trainer arguments. It also makes it easier to validate the selection since only the registry and the strategy are involved.
# before
Trainer(precision=32|16|64)
# after
Trainer(precision="full"|"mixed"|"double")
# new: true half precision everything
Trainer(precision="half")
# same
Trainer(precision="bf16")
# before
Trainer(precision=16, amp_backend="apex", amp_level="02")
# after
Trainer(precision="apex02")
# for specific customization
Trainer(plugins=ThisSpecialPrecisionPlugin(backend=..., level="X", ...)
These are my random thoughts.
@awaelchli I like that approach! However, this implies that when having precision="mixed"
it will always use native amp, right? What do we do if new precisions come up? Do we always refer to them by the fraction of fp32? E.g. 8-bit would be 'quarter'
? I think that's not very clear to many people and referring to them by something like precision='16bit'
would be more self-explaining
I think that's not very clear to many people and referring to them by something like precision='16bit' would be more self-explaining
Yes, I like it. In my proposal the main motivation was to use a different name than for the the (int) 16 to be able to introduce true half precision. So yours works too.
Trying to revive this discussion, here's what I find most logical
Flags Before | Precision After |
---|---|
precision=16 | "16-mixed" |
precision=64 | "64-true" (true) |
precision=32 | "32-true" (true) |
precision=bf16 | "bf16-mixed" |
Not supported | "16-true" (true) |
Not supported | "bf16-true" (true) |
The old precision values show a warning explaining the historical change
I like it... just not sure precision-type-amp_backend
is a good way to get the configuration.
Should we do dict?
{'precision: 16, 'mixed': False, amp_type: 'apex'}
key names can be improved.
also, amp_level
is deprecated, so you can remove that, we need to make more changes to this once this is removed.
I prefer the simplicity of the string options, they are consistent with our other registry based trainer arguments.
Also, passing a dict is very similar to passing separate arguments, and has the disadvantages of https://github.com/Lightning-AI/lightning/issues/9956#issuecomment-1091645295
they are consistent with our other registry based trainer arguments
Well, sort of. They are indicated by the argument name in their respective strategy class ("deepspeed_stage_3")
, but here one needs to remember the order (precision-type-amp_backend
). That's why I suggested a little more explicit configuration. But if others think it's fine, then 👍
@carmocca do we really want to append "-true" to all non-mixed precision stuff? intuitively, when I see precision="16", I would assume that this is always the case and there would be no need to append "true"
@carmocca do we really want to append "-true" to all non-mixed precision stuff? intuitively, when I see precision="16", I would assume that this is always the case and there would be no need to append
"true"
Okay. Updated the table above. However this is less explicit which can be bad for those who don't notice there's a difference between "true" and "mixed"
Two notes I want to bring up:
I agree @awaelchli. I believe I wrote the above proposal before we decided to deprecate Apex
The comment about "native" in the name is also something I'd like to push for removing in the class names in the future. We already have a plugin now that has NativeNative
(twice!) in the name, which is beyond ridiculous IMO, considering that most of the precision plugins we have are actually native. Although this is not as big of a user impact compared to these trainer arguments.
The removal of apex has simplified this proposal greatly. I edited my proposed API in https://github.com/Lightning-AI/lightning/issues/9956#issuecomment-1207246337
Proposed refactoring or deprecation
Does
precision="mixed"
act differently toprecision=16
in any way?I understand that "mixed" is more correct as 16-bit precision can still run some computations in 32-bit.
Motivation
In https://github.com/PyTorchLightning/pytorch-lightning/pull/9763 I noticed we did not even have a
PrecisionType
for"mixed"
.There's a single test in the codebase passing the "mixed" value:
https://github.com/PyTorchLightning/pytorch-lightning/blob/master/tests/plugins/test_deepspeed_plugin.py#L153
And no mention at all of this value in the docs.
Pitch
Have one value to set this, whether it is
16
or"mixed"
. Most likely16
since its the one widely used.Otherwise, add tests for passing
"mixed"
Additional context
If you enjoy Lightning, check out our other projects! ⚡
Metrics: Machine learning metrics for distributed, scalable PyTorch applications.
Flash: The fastest way to get a Lightning baseline! A collection of tasks for fast prototyping, baselining, finetuning and solving problems with deep learning
Bolts: Pretrained SOTA Deep Learning models, callbacks and more for research and production with PyTorch Lightning and PyTorch
Lightning Transformers: Flexible interface for high performance research using SOTA Transformers leveraging Pytorch Lightning, Transformers, and Hydra.
cc @justusschock @awaelchli @akihironitta @rohitgr7 @tchaton @borda @carmocca