Use accelerate, streamly

chris-ch / llama2-haskell-inference

Haskell version of llama2.c

MIT License

3 stars 0 forks source link

Thanks a lot, indeed there is definitely room for improvement ... I realised too late that Data.Vector.Storable (https://hackage.haskell.org/package/vector-0.13.1.0/docs/Data-Vector-Storable.html) should perform way better when updating the "AttentionKV" state ... Because I suspect memory allocation/de-allocation are wasting too much time. I tried, but with no luck on the branch: codespace-potential-bassoon-67rv7g9grw2rrrx. Memory still blows up for even relatively small models (100m).

Anyway, I am at the limit of what I can do in Haskell, so please if you have time go ahead. I really believe we should be able to get close to C performance-wise.

chris-ch / llama2-haskell-inference

Use accelerate, streamly #1