chu-tianxiang / exl2-for-all

EXL2 quantization generalized to other models.
8 stars 2 forks source link

Integration with Llama Index #2

Open mirix opened 8 months ago

mirix commented 8 months ago

Hello,

I am interested in integrating exl2-quantised models with llamaindex for RAG.

Do you think your library will work out of the box for this purposes.

Are you aware of any examples?

Best,

Ed

chu-tianxiang commented 8 months ago

I'm afraid it cannot. LlamaIndex only works for the predefined set of interfaces/APIs.

mirix commented 8 months ago

Thanks. Indeed. I have tried a number of LLMs without success. For instance:

    llm = HuggingFaceLLM(
          ^^^^^^^^^^^^^^^
  File "/home/emoman/.local/lib/python3.11/site-packages/llama_index/llms/huggingface.py", line 180, in __init__
    config_dict = self._model.config.to_dict()
                  ^^^^^^^^^^^^^^^^^^
  File "/home/emoman/.local/lib/python3.11/site-packages/torch/nn/modules/module.py", line 1695, in __getattr__
    raise AttributeError(f"'{type(self).__name__}' object has no attribute '{name}'")
AttributeError: 'Exl2ForCausalLM' object has no attribute 'config'
chu-tianxiang commented 8 months ago

Thanks. Indeed. I have tried a number of LLMs without success. For instance:

    llm = HuggingFaceLLM(
          ^^^^^^^^^^^^^^^
  File "/home/emoman/.local/lib/python3.11/site-packages/llama_index/llms/huggingface.py", line 180, in __init__
    config_dict = self._model.config.to_dict()
                  ^^^^^^^^^^^^^^^^^^
  File "/home/emoman/.local/lib/python3.11/site-packages/torch/nn/modules/module.py", line 1695, in __getattr__
    raise AttributeError(f"'{type(self).__name__}' object has no attribute '{name}'")
AttributeError: 'Exl2ForCausalLM' object has no attribute 'config'

Could you please try replace the last line with a simple return model. The additional wrapper may break the compatibility with Huggingface interface.

mirix commented 8 months ago

It works! Thanks a bunch!

It uses just one GPU and it is much slower than AWQ, so I need to figure a couple of things out, but it is working!

The testing script is here:

https://github.com/mirix/retrieval-augmented-generation/blob/main/rag_llama_index_bm25_exl2_app.py