Closed galv closed 1 year ago
You're supposed to use torch.amp.autocast(). Model.half() will push all tensors to fp16 including batch norm and many other operators that are not supported well in fp16 - for example torch.stft()
Even if stft() worked, batch norm would fail eventually
This issue is stale because it has been open for 30 days with no activity. Remove stale label or comment or this will be closed in 7 days.
This issue was closed because it has been inactive for 7 days since being marked as stale.
Describe the bug
Not sure if this is intended to be supported or not, but I don't seem to be able to run the entire Conformer CTC Large in fp16 format in CUDA. The problem seems to occur because of a missing op in the preprocessor. This seem error is mentioned here: https://github.com/pytorch/pytorch/issues/71680
Perhaps nemo can enable the preprocessor to run in fp32 while the rest runs in fp16 as a work around?
Steps/Code to reproduce bug
Error message is this:
Expected behavior
It would be great if float16 worked out of the box for NeMo
Environment overview (please complete the following information)
Environment details
If NVIDIA docker image is used you don't need to specify these. Otherwise, please provide:
Additional context