Open Snikch63200 opened 1 month ago
The low relevance of your Doc().query
results is likely due to the use of the SparseEmbeddingModel
. This model is a simple keyword search model, which may not capture the semantic meaning of the text effectively, leading to off-topic citations and less relevant answers [1].
To improve relevance, consider using a HybridEmbeddingModel
, which combines both sparse and dense embeddings. This approach can enhance the semantic understanding and improve the quality of the results [2].
Additionally, your LiteLLM
configuration with a low temperature (0.1) and a high frequency penalty (1.5) might also impact the diversity and creativity of the generated answers. Adjusting these parameters could help in fine-tuning the response quality.
To continue talking to Dosu, mention @dosu.
I use 'mixtral-8x7b-instruct-v0.1.Q4_K_M' as local LLM model. Can I use this model as embedding model or embedding models are specific an have to be run separately ?
What are are the recommended values for temperature and frequency penality to increase 'mixtral-8x7b-instruct-v0.1.Q4_K_M' performance ?
@Dosu.
I couldn't find specific information on using 'mixtral-8x7b-instruct-v0.1.Q4_K_M' as an embedding model in the PaperQA framework. Typically, embedding models are specialized and may need to be run separately from LLMs.
Regarding the recommended values for temperature and frequency penalty for 'mixtral-8x7b-instruct-v0.1.Q4_K_M', I couldn't find specific recommendations. You might need to experiment with different settings to optimize performance, starting with the default values used in your current configuration.
To continue talking to Dosu, mention @dosu.
How can I define a LiteLLM embedding model for a local embedding model ? @Dosu.
To define a LiteLLM embedding model for a local embedding model, you can use the LiteLLMEmbeddingModel
class. Here's an example:
from paperqa.llms import LiteLLMEmbeddingModel
from limits import RateLimitItemPerSecond
# Define the LiteLLM embedding model with rate limits
embedding_model = LiteLLMEmbeddingModel(
name="your-local-model-name",
config={"rate_limit": RateLimitItemPerSecond(20, 5)}
)
You can customize the configuration by passing additional keyword arguments, such as rate limits:
embedding_model = LiteLLMEmbeddingModel(
name="your-local-model-name",
config={"rate_limit": RateLimitItemPerSecond(1_000_000, 1)}
)
Alternatively, you can use the from_name
method:
embedding_model = LiteLLMEmbeddingModel.from_name("your-local-model-name")
This will create a LiteLLMEmbeddingModel
with the specified name and configuration [1][2][3].
To continue talking to Dosu, mention @dosu.
I defined my local embedding model as follows :
embedding_model = HybridEmbeddingModel([LiteLLMEmbeddingModel.from_name(f"openai/my-llm-model"),
SparseEmbeddingModel(ndim=4096)])
Got this error :
Traceback (most recent call last):
File "C:\Users\XXXXXXXXXXX\Desktop\PaperQA\paper_qa_question_async.py", line 85, in <module>
asyncio.run(my_query())
File "C:\ProgramData\anaconda3\envs\PaperQA_env\Lib\asyncio\runners.py", line 194, in run
return runner.run(main)
^^^^^^^^^^^^^^^^
File "C:\ProgramData\anaconda3\envs\PaperQA_env\Lib\asyncio\runners.py", line 118, in run
return self._loop.run_until_complete(task)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "C:\ProgramData\anaconda3\envs\PaperQA_env\Lib\asyncio\base_events.py", line 687, in run_until_complete
return future.result()
^^^^^^^^^^^^^^^
File "C:\Users\XXXXXXXXXXX\Desktop\PaperQA\paper_qa_question_async.py", line 66, in my_query
embedding_model = HybridEmbeddingModel([LiteLLMEmbeddingModel.from_name(f"openai/my-llm-model"),
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "C:\ProgramData\anaconda3\envs\PaperQA_env\Lib\site-packages\pydantic\_internal\_model_construction.py", line 262, in __getattr__
raise AttributeError(item)
AttributeError: from_name. Did you mean: 'from_orm'?
Any solution ? @Dosa.
Hello,
I wonder why my Doc().query request often achieve random and often poor quality answers i term of relevance.
Papers are sometimes relevant sometimes not... Citations are, most of time, completly off-topic even if selected document contains relevant informations.
I use this config for LiteLLM model :
With these answer parameters :
I use a SparseEmbedding model as follows to run a full local PaperQA instance :
embedding_model = SparseEmbeddingModel(ndim=4096)
Is it a parameter problem (like number of tokens) or an embedding model problem (I read in code documentation sparse embedding is a very simple approach).
Best regards. @Dosu.