Is `precision="mixed"` redundant?

carmocca commented 3 years ago

Proposed refactoring or deprecation

Does precision="mixed" act differently to precision=16 in any way?

I understand that "mixed" is more correct as 16-bit precision can still run some computations in 32-bit.

Motivation

In https://github.com/PyTorchLightning/pytorch-lightning/pull/9763 I noticed we did not even have a PrecisionType for "mixed".

There's a single test in the codebase passing the "mixed" value:

https://github.com/PyTorchLightning/pytorch-lightning/blob/master/tests/plugins/test_deepspeed_plugin.py#L153

And no mention at all of this value in the docs.

Pitch

Have one value to set this, whether it is 16 or "mixed". Most likely 16 since its the one widely used.

Otherwise, add tests for passing "mixed"

Additional context

$ grep -iIrn '"mixed"' pytorch_lightning
pytorch_lightning/plugins/training_type/sharded.py:62:                    is_fp16 = precision in ("mixed", 16)
pytorch_lightning/plugins/training_type/fully_sharded.py:135:            mixed_precision=precision == "mixed",
pytorch_lightning/plugins/training_type/ipu.py:42:        if self.precision in ("mixed", 16):
pytorch_lightning/plugins/training_type/deepspeed.py:405:            dtype = torch.float16 if self.precision in (16, "mixed") else torch.float32
pytorch_lightning/plugins/training_type/deepspeed.py:473:            dtype = torch.float16 if self.precision in (16, "mixed") else torch.float32
pytorch_lightning/plugins/training_type/deepspeed.py:602:        if precision in (16, "mixed"):
pytorch_lightning/plugins/precision/mixed.py:26:    precision: Union[str, int] = "mixed"
pytorch_lightning/plugins/precision/fully_sharded_native_amp.py:26:    precision = "mixed"

$ grep -iIrn '"mixed"' tests
tests/plugins/test_deepspeed_plugin.py:153:@pytest.mark.parametrize("precision", [16, "mixed"])

$ grep -Irn 'mixed' docs | grep 'precision='
# no mention in the docs!

If you enjoy Lightning, check out our other projects! ⚡

Metrics: Machine learning metrics for distributed, scalable PyTorch applications.
Flash: The fastest way to get a Lightning baseline! A collection of tasks for fast prototyping, baselining, finetuning and solving problems with deep learning
Bolts: Pretrained SOTA Deep Learning models, callbacks and more for research and production with PyTorch Lightning and PyTorch
Lightning Transformers: Flexible interface for high performance research using SOTA Transformers leveraging Pytorch Lightning, Transformers, and Hydra.

cc @justusschock @awaelchli @akihironitta @rohitgr7 @tchaton @borda @carmocca

justusschock commented 3 years ago

@carmocca during the first accelerator refactor we have introduced the term mixed (before it was 16 always) because we were actually planning to deprecate the alias of 16 for AMP and have real half precision training (like we have for double precision) with that flag instead.

jzazo commented 3 years ago

What's the status of this discussion? I wanted to train a BERT-like model with half precision, where the model is also half precision and not only the inputs. What would be the correct way to train not using mixed precision? Thanks!

justusschock commented 3 years ago

That's not yet decided.

For a true half precision training the current way to go would be to use a custom PrecisionPlugin similar to what we do for double precision

stale[bot] commented 2 years ago

This issue has been automatically marked as stale because it hasn't had any recent activity. This issue will be closed in 7 days if no further activity occurs. Thank you for your contributions, Pytorch Lightning Team!

stale[bot] commented 2 years ago

This issue has been automatically marked as stale because it hasn't had any recent activity. This issue will be closed in 7 days if no further activity occurs. Thank you for your contributions, Pytorch Lightning Team!

carmocca commented 2 years ago

A user in Slack got confused by this

How do I do mixed precision training (CPU / GPU)? [...] Allowed precision values: ('16', '32', '64', 'bf16', 'mixed') suggest that 16 trains pure 16 -bit and mixed trains amp . However, the documentation states that 16 trains with mixed precision. When calling Trainer(precision='mixed') I get an error message No precision set . So how do I do pure half-precision, mixed and full-precision training?

Do we want to support true half-precision? I think we should.

The problem is the API. Since precision=16 is used for mixed 16bit, we would need a deprecation phase so that precision=16 becomes true half-precision and precision=mixed_16 for mixed half-precision.

The disadvantage of this is that the instinct from users will be to pass precision=16 because many people don't understand the difference between true precision and mixed precision. On the other hand, having precision=16 be mixed half-precision and requiring precision='true_16' would be inconsistent with the behavior of precision=16.

One more option is to add one extra flag precision_type="full"|"mixed" where we remove precision="mixed". We automatically choose the best type given the precision value.

SeanNaren commented 2 years ago

Thanks, @carmocca I think this is a great issue to bring up.

I too was confused by the naming convention originally and required me to go through Pytorch Lightning to understand what the flag means.

I believe that changing precision=16 from being AMP is very dangerous, and could lead to many angry users that do not heed the deprecation warning (unless it was incredible loud :P). I love the idea of adding a precision_type of some sort to control whether we'd assume mixed or full. Maybe even just mixed_precision=True | False.

wisecornelius commented 2 years ago

My suggestion for the API would be to have arguments

amp_backend as one argument ("apex" vs "native")
precision, which is either [00, 01, 02, 03] for apex or [16, mixed, 32, 64] for native.

In terms of naming for the native backend, I would suggest

Full precision: "full" and 32 Mixed precision: "mixed" Half precision: "half" and 16 Double precision: "double" and 64 Bfloat precision: "bf16" or "bfloat16"

If there wont be half precision, 16 should be deprecated because it is a misnomer.

akihironitta commented 2 years ago

I believe that changing precision=16 from being AMP is very dangerous, and could lead to many angry users that do not heed the deprecation warning (unless it was incredible loud :P). I love the idea of adding a precision_type of some sort to control whether we'd assume mixed or full. Maybe even just mixed_precision=True | False.

I agree with @SeanNaren. Even though users will need to remember to set two flags (precision and mixed_precision) for doing one thing (true16, ...), I don't think it's worth changing it from being amp, just for a backward-compatibility reason as SeanNaren mentioned above.

I also like @carmocca's suggestion for adding a flag mixed_precision=True | False even though that's adding one more thing to remember for users. I think it may be fine if we document it properly so that the new flag is discoverable.

rohitgr7 commented 2 years ago

also like @carmocca's suggestion for adding a flag mixed_precision=True | False even though that's adding one more thing to remember for users. I think it may be fine if we document it properly so that the new flag is discoverable.

with another flag, we will end up with 4 flags that determine precision for different configurations.

off-topic: I was thinking if we could deprecate amp_level btw and let users choose a different amp_level by passing Trainer(..., plugins=[ApexMixedPrecisionPlugin(amp_level=...)]).

carmocca commented 2 years ago

My suggestion for the API would be to have arguments...

@wisecornelius I think that is very correct and concise but prone to errors from inexperienced users who don't understand all the nuances and differences between backends.

Maybe even just mixed_precision=True | False.

This works now, I just wonder if there'll ever be a new precision algorithm. We would want to deprecate this bool if that ever happens. If we go with the bool, it should be amp_enabled=True|False for consistency

So we would have precision, amp_backend, amp_level, precision_type/amp_enabled

What I don't really like is that the product of options is very sparse. It does feel boilerplate-y to me but there's no better solution without going through deprecations.

off-topic: I was thinking if we could deprecate amp_level...

This makes sense to me, but you should open a new issue for that.

wisecornelius commented 2 years ago

What I don't really like is that the product of options is very sparse. It does feel boilerplate-y to me but there's no better solution without going through deprecations.

@carmocca I agree. I do not like the sparsity. It seems that the strategy repository is currently a project to make the strategy selection using dense arguments. I just learned that amp_level=01 only works when precision=16. I did that wrong for 6 months...

otaj commented 2 years ago

I feel like having four different options controlling the precision AND, as @carmocca pointed out, the product of these to be very sparse is not a good approach.

How about having "only" three of them and folding the precision_type/amp_enabled suggested in the last comment into amp_level, where None would indicate something along the lines of amp_enabled=False and anything other would indicate amp_enabled=True?

awaelchli commented 2 years ago

Looking at the history, newly added trainer arguments usually don't survive that long 😅

I believe we should take in the learnings from the accelerator/strategy/plugin syntax, where we introduced the concept of registries for combinations of selections under one name. This helped us push trainer arguments to the individual plugins while still enabling an easy to use interface in code and CLI. Since amp_level and amp_backend only apply to a subset of available plugins, these could be moved to the individual implementations and registered as names for the precision argument (introducing a precision registry). This solves the sparsity problem as we would only expose the most common combinations as names and makes it easier to discover and understand the Trainer arguments. It also makes it easier to validate the selection since only the registry and the strategy are involved.

# before
Trainer(precision=32|16|64)
# after
Trainer(precision="full"|"mixed"|"double")

# new: true half precision everything
Trainer(precision="half")

# same
Trainer(precision="bf16")

# before
Trainer(precision=16, amp_backend="apex", amp_level="02")
# after
Trainer(precision="apex02")

# for specific customization
Trainer(plugins=ThisSpecialPrecisionPlugin(backend=..., level="X", ...)

These are my random thoughts.

justusschock commented 2 years ago

@awaelchli I like that approach! However, this implies that when having precision="mixed" it will always use native amp, right? What do we do if new precisions come up? Do we always refer to them by the fraction of fp32? E.g. 8-bit would be 'quarter'? I think that's not very clear to many people and referring to them by something like precision='16bit' would be more self-explaining

awaelchli commented 2 years ago

I think that's not very clear to many people and referring to them by something like precision='16bit' would be more self-explaining

Yes, I like it. In my proposal the main motivation was to use a different name than for the the (int) 16 to be able to introduce true half precision. So yours works too.

carmocca commented 2 years ago

Trying to revive this discussion, here's what I find most logical

Flags Before	Precision After
precision=16	"16-mixed"
precision=64	"64-true" (true)
precision=32	"32-true" (true)
precision=bf16	"bf16-mixed"
Not supported	"16-true" (true)
Not supported	"bf16-true" (true)

The old precision values show a warning explaining the historical change

rohitgr7 commented 2 years ago

I like it... just not sure precision-type-amp_backend is a good way to get the configuration.

Should we do dict?

{'precision: 16, 'mixed': False, amp_type: 'apex'}

key names can be improved.

also, amp_level is deprecated, so you can remove that, we need to make more changes to this once this is removed.

carmocca commented 2 years ago

I prefer the simplicity of the string options, they are consistent with our other registry based trainer arguments.

Also, passing a dict is very similar to passing separate arguments, and has the disadvantages of https://github.com/Lightning-AI/lightning/issues/9956#issuecomment-1091645295

rohitgr7 commented 2 years ago

they are consistent with our other registry based trainer arguments

Well, sort of. They are indicated by the argument name in their respective strategy class ("deepspeed_stage_3"), but here one needs to remember the order (precision-type-amp_backend). That's why I suggested a little more explicit configuration. But if others think it's fine, then 👍

justusschock commented 2 years ago

@carmocca do we really want to append "-true" to all non-mixed precision stuff? intuitively, when I see precision="16", I would assume that this is always the case and there would be no need to append "true"

carmocca commented 2 years ago

@carmocca do we really want to append "-true" to all non-mixed precision stuff? intuitively, when I see precision="16", I would assume that this is always the case and there would be no need to append "true"

Okay. Updated the table above. However this is less explicit which can be bad for those who don't notice there's a difference between "true" and "mixed"

awaelchli commented 2 years ago

Two notes I want to bring up:

I would develop this in parallel with #14416 and release together to minimize friction
I would prefer to leave the "native" postfix out of the picture. I think it is fair to assume that anything, unless explicitly mentioned, is native to what torch does. This has the benefit that the name becomes shorter, but does not prevent us from future integrations with new backends that are not native, as we would postfix the name.

carmocca commented 2 years ago

I agree @awaelchli. I believe I wrote the above proposal before we decided to deprecate Apex

awaelchli commented 2 years ago

The comment about "native" in the name is also something I'd like to push for removing in the class names in the future. We already have a plugin now that has NativeNative (twice!) in the name, which is beyond ridiculous IMO, considering that most of the precision plugins we have are actually native. Although this is not as big of a user impact compared to these trainer arguments.

carmocca commented 1 year ago

The removal of apex has simplified this proposal greatly. I edited my proposed API in https://github.com/Lightning-AI/lightning/issues/9956#issuecomment-1207246337

Lightning-AI / pytorch-lightning