Future-House / paper-qa

High accuracy RAG for answering questions from scientific documents with citations
Apache License 2.0
6.31k stars 595 forks source link

settings are correctly finding the locally running model, but docs gives err authentication error #507

Open hweiske opened 1 month ago

hweiske commented 1 month ago

I got paper-qa correctly running with a local model. However when trying to use a paper folder, an AuthenticationError occurs, with:

  | litellm.exceptions.AuthenticationError: litellm.AuthenticationError: AuthenticationError: OpenAIException - Error code: 401 - {'error': {'message': 'Incorrect API key provided: sk-no-ke******ired. You can find your API key at https://platform.openai.com/account/api-keys.', 'type': 'invalid_request_error', 'param': None, 'code': 'invalid_api_key'}}

Additionally a decoder error occurs


[15:50:22] Failed to parse all of title, DOI, and authors from the ParsingSettings.structured_citation_prompt's response                                                                                           
           {                                                                                                                                                                                                       
               "title": null,                                                                                                                                                                                      
                   "authors":["author 1 t al."],                                                                                                                                               
              "DOI":"https://doi.org//10.1116/1.481xxx"                                                                                                                                                            
           }                                                                                                                                                                                                       
       Note: The title is missing from the citation provided and DOI should be in a standard format so I have corrected it to match that of other DOIs, consider using a manifest file or specifying a         
       different citation prompt.                                                                                                                                                                              
       ╭───────────────────────────────────────────────────────────────────────────────── Traceback (most recent call last) ──────────────────────────────────────────────────────────────────────────────────╮
       │ somefolder/paperqa/docs.py:321 in aadd                                                                                                                    │
       │                                                                                                                                                                                                      │
       │   318 │   │   │   if clean_text.startswith("json"):                                                                                                                                                  │
       │   319 │   │   │   │   clean_text = clean_text.replace("json", "", 1)                                                                                                                                 │
       │   320 │   │   │   try:                                                                                                                                                                               │
       │ ❱ 321 │   │   │   │   citation_json = json.loads(clean_text)                                                                                                                                         │
       │   322 │   │   │   │   if citation_title := citation_json.get("title"):                                                                                                                               │
       │   323 │   │   │   │   │   title = citation_title                                                                                                                                                     │
       │   324 │   │   │   │   if citation_doi := citation_json.get("doi"):                                                                                                                                   │
       │                                                                                                                                                                                                      │
       │ somefolder/json/__init__.py:346 in loads                                                                                                                                │
       │                                                                                                                                                                                                      │
       │   343 │   if (cls is None and object_hook is None and                                                                                                                                                │
       │   344 │   │   │   parse_int is None and parse_float is None and                                                                                                                                      │
       │   345 │   │   │   parse_constant is None and object_pairs_hook is None and not kw):                                                                                                                  │
       │ ❱ 346 │   │   return _default_decoder.decode(s)                                                                                                                                                      │
       │   347 │   if cls is None:                                                                                                                                                                            │
       │   348 │   │   cls = JSONDecoder                                                                                                                                                                      │
       │   349 │   if object_hook is not None:                                                                                                                                                                │
       │                                                                                                                                                                                                      │
       │ somefolder/json/decoder.py:340 in decode                                                                                                                                │
       │                                                                                                                                                                                                      │
       │   337 │   │   obj, end = self.raw_decode(s, idx=_w(s, 0).end())                                                                                                                                      │
       │   338 │   │   end = _w(s, end).end()                                                                                                                                                                 │
       │   339 │   │   if end != len(s):                                                                                                                                                                      │
       │ ❱ 340 │   │   │   raise JSONDecodeError("Extra data", s, end)                                                                                                                                        │
       │   341 │   │   return obj                                                                                                                                                                             │
       │   342 │                                                                                                                                                                                              │
       │   343 │   def raw_decode(self, s, idx=0):                                                                                                                                                            │
       ╰──────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────╯
       JSONDecodeError: Extra data: line 7 column 1 (char 120)             
I removed folder names and the citation. 
Now the question is - is it a database error or a bug?

using pyzotero I only get the authentication error

```python
litellm.exceptions.AuthenticationError: litellm.AuthenticationError: AuthenticationError: OpenAIException - Error code: 401 - {'error': {'message': 'Incorrect API key provided: sk-no-ke******ired. You can find your API key at https://platform.openai.com/account/api-keys.', 'type': 'invalid_request_error', 'param': None, 'code': 'invalid_api_key'}}

atleast in the cost of the model litellm is correctly using the local model

Looking up model=ollama/llama3.1 in model_cost_map
Success: model=ollama/llama3.1 in model_cost_map
prompt_tokens=144; completion_tokens=80
Returned custom cost for model=ollama/llama3.1 - prompt_tokens_cost_usd_dollar: 0.0, completion_tokens_cost_usd_dollar: 0.0
dosubot[bot] commented 1 month ago

Hey there, @hweiske! 👋 I'm here to help you with any bugs, questions, or contributions you have. I'm a bot designed to assist you while you wait for a human maintainer.

The issues you're encountering with paper-qa and pyzotero are likely related to bugs or misconfigurations rather than database errors.

  1. AuthenticationError:

    • The AuthenticationError indicates that the API key provided is incorrect or expired. Ensure that the ZOTERO_USER_ID and ZOTERO_API_KEY environment variables are correctly set. This error is more indicative of a bug or misconfiguration rather than a database issue [1].
  2. JSONDecodeError:

    • The JSONDecodeError suggests that the JSON response from Zotero is not being parsed correctly. This could be due to an issue with the response format or how the response is being handled in the code. This is also more indicative of a bug rather than a database error [1].

Additionally, there are known issues and limitations documented in the paper-qa repository related to the integration with Zotero, such as the reliance on environment variables and potential issues with PDF handling [2][1].

To address these issues, ensure that your environment variables are correctly configured and consider reviewing the handling of JSON responses in your code.

To continue talking to Dosu, mention @dosu.

hweiske commented 1 month ago

The API error results from the open API key. It seems that the docs object has a problem with the specification of no API key, due to the local model.