Closed Yard1 closed 4 months ago
Right now, we require q and kv tensors to have the same dtype, but that is not enforced, which can lead to cryptic memory errors in case of a misconfiguration. This PR adds a check to ensure that we prevent mismatched dtypes.
Right now, we require q and kv tensors to have the same dtype, but that is not enforced, which can lead to cryptic memory errors in case of a misconfiguration. This PR adds a check to ensure that we prevent mismatched dtypes.