4 bit quantization not happening - code getting stuck -

LLukas22 / llm-rs-python

Unofficial python bindings for the rust llm library. 🐍❤️🦀

MIT License

71 stars 4 forks source link

4 bit quantization not happening - code getting stuck - #15

Closed sidharthiimc closed 1 year ago

sidharthiimc commented 1 year ago

AutoConverter work and produces fp16weights (13GB for llama 7b) these can be loaded as well for inference. But that's not what we need.

And AutoQuantizer gets stuck for ever. The output file gets created around 2.7GB. But doesn't loads for inference. Shows following error -

LLukas22 commented 1 year ago

I will attempt to replicate these issues on my local machine, although I haven't experienced anything similar thus far.

As an interim solution, I would suggest executing the quantization process within a standalone script instead of a notebook. My automated conversions and quantizations have not encountered any issues when performed this way.

If this doesn't provide a resolution, you can utilize the quantization features of either rustformers/llm or llama.cpp. The converted model should be compatible with both of these.

For a comprehensive understanding, could you please provide details of the operating system, Python version, and llm-rs version you are using?

sidharthiimc commented 1 year ago

Architecture: x86_64 CPU op-mode(s): 32-bit, 64-bit Byte Order: Little Endian CPU(s): 32 On-line CPU(s) list: 0-31 Thread(s) per core: 1 Core(s) per socket: 24 Socket(s): 1 NUMA node(s): 1 Vendor ID: GenuineIntel CPU family: 6 Model: 183 Model name: 13th Gen Intel(R) Core(TM) i9-13900 Stepping: 1 CPU MHz: 5300.000 CPU max MHz: 7200.0000 CPU min MHz: 800.0000 BogoMIPS: 3993.60 Virtualization: VT-x L1d cache: 48K L1i cache: 32K L2 cache: 2048K L3 cache: 36864K NUMA node0 CPU(s): 0-31

OS: Ubuntu 18.04

Python = 3.10.9 llm-rs = Not able to get the version. But I installed yesterday.

LLukas22 commented 1 year ago

Thank you for providing the necessary details. I was able to replicate the issue, and it appears to exclusively affect Jupyter notebooks. This might be a consequence of the quantize function currently writing to the standard output (stdout) from the Rust side. I will aim to resolve this in the upcoming release by incorporating a callback into the quantize function, thereby preventing direct writes to stdout.

In the interim, utilizing a standard Python script for quantization should circumvent this problem, as stdout will be appropriately directed to the console in this setup.

sidharthiimc commented 1 year ago

Yes. I came here because was not able to convert using lama.cpp codes. But now I am able to convert after adding new gcc version as per

Though having a jupyter style will be really helpful.