Add --tf32 device flag for transparent float32 acceleration

graehl commented 2 years ago

--tf32 0|1 bool device (torch.backends.cuda.matmul.allow_tf32) enabling 10-bit precision (19 bit total) transparent float32 acceleration. default true for backward compat with torch < 1.12. allow different --tf32 training continuation
device.init_device called by train, translate, and score
allow torch 1.12 in requirements.txt
require pytest 2 (3 fails)

Pull Request Checklist

[x] Changes are complete (if posting work-in-progress code, prefix your pull request title with '[WIP]' until you can check this box.
[x] Unit tests pass (pytest)
[x] Were system tests modified? If so did you run these at least 5 times to account for the variation across runs?
[x] System tests pass (pytest test/system)
[x] Passed code style checking (./style-check.sh)
[x] You have considered writing a test
[x] Updated major/minor version in sockeye/__init__.py. Major version bump if this is a backwards incompatible change.
[x] Updated CHANGELOG.md

By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.

fhieber commented 2 years ago

Thanks Johnathan! I agree that defaulting to allow_tf32 to keep behavior consistent to previous versions would be preferred. While you are at it, you could also update requirements.txt to allow PyTorch 1.12.x (<1.13.0)

graehl commented 2 years ago

I was unaware that it defaulted true previously. Agree.

graehl commented 2 years ago

I rebased into a single commit for all of the above.

graehl commented 2 years ago

had to revert pytest<3 requirement (but in fact tests fail w/ pytest 3) due to automated test failure above (tests work locally for me)

fhieber commented 2 years ago

Thanks for the changes, I realized I never submitted my pending review from over a month ago, apologies for the delay. I'll merge this now.

awslabs / sockeye

Add --tf32 device flag for transparent float32 acceleration #1066

Pull Request Checklist