Closed lkct closed 1 year ago
Benchmark results: 0.106/5.0
On the other hand, (C)FBK
is 0.118/5.0, and therefore not chosen. (PS: with CFBK
input, the einsum
in mixing layer outputs non-contiguous tensor with all params configuration)
This is a nice 2x speed up improvement! As a ref, in Antonio's experiments he was using collapsed CP, this might explain why he was (slightly) faster.
Do we have benchmark results with a higher batch size? I feel that the choice of ...KB
and ...BK
depends on the batch size. I think ...BK
is a more natural layout for our tensors. Moreover, this would remove the need of the final transpose on the circuit's output.
yes we should add more configurations to benchmark, and that's why I'm keeping a bak branch for FBK
Based on #91, we should switch to a more efficient axes order.
The inner data flow is now
(C)FKB
, while the input and output of the model areBK
which may be more intuitive. (output is contiguous asKB
and aBK
view is returned)The input layer is not yet modified due to #93.