HKUDS / LightRAG

"LightRAG: Simple and Fast Retrieval-Augmented Generation"
https://arxiv.org/abs/2410.05779
MIT License
9.22k stars 1.13k forks source link

Azure Databricks tokenizer issue #313

Open stevenveenma opened 1 day ago

stevenveenma commented 1 day ago

Thank you for this promising repository that I would like to make use of. I am bound to use Azure Databricks and have installed the repository there. Then, I configured examples/lightrag_azure_openai_demo.py in a notebook. I was able to solve some issues, but now I am encountering the following error message:

Resposta do llm_model_func: I'm just a computer program, so I don't have feelings, but I'm here and ready to help you! How can I assist you today? Resultado do embedding_func: (1, 1536) Dimensão da embedding: 1536 General error in processing: Error inserting book contents into rag: 'Could not automatically map gpt-4o-mini to a tokenizer. Please use tiktoken.get_encoding to explicitly get the tokenizer you expect.'

The error message is strange because I am using gpt-4o and not gpt-4o-mini. Furthermore, the cause seems to lie in the tokenizer. I tried to resolve the error with the assistance of GPT, but it was unsuccessful. I would appreciate your help with this.