Open lostmsu opened 2 months ago
Can you show the actual line of code used? Are you getting the warning during runtime or at compile / interpret?
I don't see this warning, when using a CausalSelfAttention layer inside of a transformer architecture.
This is the line of code I used:
// "Flash" attention
var y = F.scaled_dot_product_attention(q, k, v, is_casual: true);
where q,k,v are the query, key, values from a Causal Attention linear layer.
This is printed when I call
functional.scaled_dot_product_attention
:I'm on Windows with
TorchSharp-cuda-windows=0.103.0