Future-House / paper-qa

High accuracy RAG for answering questions from scientific documents with citations
Apache License 2.0
6.44k stars 618 forks source link

Fixed crash due to DOI being a `list` #619

Closed jamesbraza closed 1 month ago

jamesbraza commented 1 month ago

I hit a new crash today:

    | Traceback (most recent call last):
    |   File "/path/to/.venv/lib/python3.12/site-packages/paperqa/agents/search.py", line 438, in process_file
    |     await tmp_docs.aadd(
    |   File "/path/to/.venv/lib/python3.12/site-packages/paperqa/docs.py", line 364, in aadd
    |     doc = await metadata_client.upgrade_doc_to_doc_details(
    |           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
    |   File "/path/to/.venv/lib/python3.12/site-packages/paperqa/clients/__init__.py", line 207, in upgrade_doc_to_doc_details
    |     0 if not extra_fields else DocDetails(**extra_fields)
    |                                ^^^^^^^^^^^^^^^^^^^^^^^^^^
    |   File "/path/to/.venv/lib/python3.12/site-packages/pydantic/main.py", line 212, in __init__
    |     validated_self = self.__pydantic_validator__.validate_python(data, self_instance=self)
    |                      ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
    |   File "/path/to/.venv/lib/python3.12/site-packages/paperqa/types.py", line 567, in validate_all_fields
    |     data = cls.lowercase_doi_and_populate_doc_id(data)
    |            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
    |   File "/path/to/.venv/lib/python3.12/site-packages/paperqa/types.py", line 390, in lowercase_doi_and_populate_doc_id
    |     if doi.startswith(url):
    |        ^^^^^^^^^^^^^^
    | AttributeError: 'list' object has no attribute 'startswith'
    +------------------------------------

Somehow it looks like the DocDetails input data for DOI was a list not a str (as expected).

This PR handles this new edge case