Future-House / paper-qa

High accuracy RAG for answering questions from scientific documents with citations
Apache License 2.0
6.2k stars 583 forks source link

Example using locally-hosted model is not working #541

Open nleguillarme opened 1 week ago

nleguillarme commented 1 week ago

I am trying to use paper-qa with a locally-hosted model. However, the provided example:

from paperqa import Settings, ask

local_llm_config = dict(
    model_list=[
        dict(
            model_name="my_llm_model",
            litellm_params=dict(
                model="my-llm-model",
                api_base="http://localhost:8080/v1",
                api_key="sk-no-key-required",
                temperature=0.1,
                frequency_penalty=1.5,
                max_tokens=512,
            ),
        )
    ]
)

answer = ask(
    "What manufacturing challenges are unique to bispecific antibodies?",
    settings=Settings(
        llm="my-llm-model",
        llm_config=local_llm_config,
        summary_llm="my-llm-model",
        summary_llm_config=local_llm_config,
    ),
)

raises the following exception:

litellm.exceptions.BadRequestError: litellm.BadRequestError: LLM Provider NOT provided. Pass in the LLM provider you are trying to call. You passed model=my-llm-model
 Pass model as E.g. For 'Huggingface' inference endpoints pass in `completion(model='huggingface/starcoder',..)` Learn more: https://docs.litellm.ai/docs/providers
Snikch63200 commented 1 week ago

Hello,

PaperQA' documantation is not very clear about this... I made much trials to understand what's up. You have to specify inference endpoint. If you use a llamafile server, you'll have to specify "openai/my-llm-model" as model name, with ollama "ollama/my-llm-model".

Example for llamafile hosted locally :

local_llm_config = dict(
    model_list=[
        dict(
            model_name=f"openai/my-llm-model",
            litellm_params=dict(
                model=f"openai/my-llm-model",
                api_base="http://localhost:8080/v1",
                api_key="sk-no-key-required",
                temperature=0.1,
                frequency_penalty=1.5,
                max_tokens=1024,
            ),
        )
    ]
)

However, you'll still have an API connexion error that seems due to embedding model... So I don't use 'ask' function and use doc.query instead as follows :

embedding_model = SparseEmbeddingModel(ndim=256)

docs = Docs()

for doc in tqdm(file_list):
    try:
        docs.add(str("./Papers/"+str(doc)),
                     citation="File " + doc, docname=doc,
                     settings=settings,
                     embedding_model=embedding_model)
    except Exception as e:
        # sometimes this happens if PDFs aren't downloaded or readable
        print("Could not read", doc, e)
        continue

answer = docs.query(
    "Your question.",
    settings=settings,
    embedding_model=embedding_model,
)

I guess a clear and complete documentation would be welcomed.

Hope it helps.

Best regards.

thiner commented 1 week ago

I think it's simply not implemented or merged into main branch. By searching "API_BASE" in the project, you can't find any relevant code. https://github.com/search?q=repo%3AFuture-House%2Fpaper-qa+API_BASE&type=code

whitead commented 6 days ago

Hi @thiner, we use litellm and it handles that kind of config. It should be parsed and you can find more information here

thiner commented 6 days ago

I see. But why runs a LiteLLM inside PQA? It's better delpoy the service independently, decouple the model variation from PQA itself. It's also common we have already had the LiteLLM running, configure the other LiteLLM instance seems redundant.