Closed manuel-tran closed 11 months ago
Hi, thanks for your interest, and great question!
We currently only support fp16/bf16 inputs and weights. We'll be bringing in fp32 support everywhere soon (including the autocast version, with fp32 weights and fp16/bf16 inputs). Will update this issue when that's in.
Hi, we've just pushed a commit that should fix this, as of commit afceac4. Can you give it a try and see if it works for your pipelines now?
Hi, thanks for the update. I tried the new feature and it works!
Dear Hazy Research Team,
Thanks for releasing flash-ffr-conv. I was eagerly awaiting the announcement after reading the Hyena paper. While implementing FlashDepthWiseConv1d, I noticed that it only works when both inputs and weights are half-precision (
RuntimeError: u must be float16 or bfloat16
andRuntimeError: weight must be float16 or bfloat16
).Although converting both to FP16 is not an issue when training without torch.autocast, it is a problem in mixed-precision training because autocast cannot handle scaling FP16 gradients (
ValueError: Attempting to unscale FP16 gradients
). Is there a way to use FlashDepthWiseConv1d in a Hyena model with mixed-precision training?Here is a minimal example. Thank you very much!