ankan-ban / llama2.cu

Inference Llama 2 in one file of pure Cuda
MIT License
16 stars 2 forks source link

update weighted sum of values #1

Closed kroggen closed 1 year ago

kroggen commented 1 year ago

I made a test with this code but the output is not OK

I suspect it is because it has way more conversions of FP32 and FP16

ankan-ban commented 1 year ago

@kroggen , please check the latest implementation in this branch: https://github.com/ankan-ban/llama2.cu/tree/opt It fixes the inefficient memory access issue.

kroggen commented 1 year ago

Cool!

But not so easy to understand

The main branch is better for learning purposes

Have you benchmarked the 2 branches?