PotatoSpudowski / fastLLaMa

fastLLaMa: An experimental high-performance framework for running Decoder-only LLMs with 4-bit quantization in Python using a C/C++ backend.
https://potatospudowski.github.io/fastLLaMa/
MIT License
409 stars 27 forks source link

README.md is outdated in sections #running-llama and #running-alpaca-lora #81

Open stduhpf opened 1 year ago

stduhpf commented 1 year ago

For example, it still uses old syntax for convert-pth-to-ggml.py, and for export-from-huggingface.py. Also maybe it would be better to clarify that even when installing through pip, we still need to call compile.py to get the quantize executable.

PotatoSpudowski commented 1 year ago

Hi @stduhpf , So sorry for the late reply.

We got held up with work and during our free time we have been exploring ways to use MLIR and potentially remove dependency from GGML as that is more beneficial in the long run. We haven't found the time to move forward with any goals. we will get back to this at the earliest. Alternatively if you have some bandwidth please feel free to raise an MR, No worries if not :)