Future-House / paper-qa

High accuracy RAG for answering questions from scientific documents with citations
Apache License 2.0
6.44k stars 618 forks source link

Adds openalex client as a default client #555

Closed nadolskit closed 1 month ago

nadolskit commented 1 month ago

Summary of Changes

whitead commented 1 month ago

Got this error playing with the branch:

File ~/repos/paper-qa/paperqa/clients/openalex.py:155, in parse_openalex_to_doc_details(message)
    [147](https://file+.vscode-resource.vscode-cdn.net/Users/andrewwhite/Dropbox/SDB/peptides/~/repos/paper-qa/paperqa/clients/openalex.py:147) authors = [reformat_name(author) for author in authors]
    [148](https://file+.vscode-resource.vscode-cdn.net/Users/andrewwhite/Dropbox/SDB/peptides/~/repos/paper-qa/paperqa/clients/openalex.py:148) sanitized_authors = [
    [149](https://file+.vscode-resource.vscode-cdn.net/Users/andrewwhite/Dropbox/SDB/peptides/~/repos/paper-qa/paperqa/clients/openalex.py:149)     mutate_acute_accents(text=author, replace=True) for author in authors
    [150](https://file+.vscode-resource.vscode-cdn.net/Users/andrewwhite/Dropbox/SDB/peptides/~/repos/paper-qa/paperqa/clients/openalex.py:150) ]
    [152](https://file+.vscode-resource.vscode-cdn.net/Users/andrewwhite/Dropbox/SDB/peptides/~/repos/paper-qa/paperqa/clients/openalex.py:152) publisher = (
    [153](https://file+.vscode-resource.vscode-cdn.net/Users/andrewwhite/Dropbox/SDB/peptides/~/repos/paper-qa/paperqa/clients/openalex.py:153)     message.get("primary_location", {})
    [154](https://file+.vscode-resource.vscode-cdn.net/Users/andrewwhite/Dropbox/SDB/peptides/~/repos/paper-qa/paperqa/clients/openalex.py:154)     .get("source", {})
--> [155](https://file+.vscode-resource.vscode-cdn.net/Users/andrewwhite/Dropbox/SDB/peptides/~/repos/paper-qa/paperqa/clients/openalex.py:155)     .get("host_organization_name")
    [156](https://file+.vscode-resource.vscode-cdn.net/Users/andrewwhite/Dropbox/SDB/peptides/~/repos/paper-qa/paperqa/clients/openalex.py:156) )
    [157](https://file+.vscode-resource.vscode-cdn.net/Users/andrewwhite/Dropbox/SDB/peptides/~/repos/paper-qa/paperqa/clients/openalex.py:157) journal = message.get("primary_location", {}).get("source", {}).get("display_name")
    [159](https://file+.vscode-resource.vscode-cdn.net/Users/andrewwhite/Dropbox/SDB/peptides/~/repos/paper-qa/paperqa/clients/openalex.py:159) best_oa_location = message.get("best_oa_location")

AttributeError: 'NoneType' object has no attribute 'get'
nadolskit commented 1 month ago

@whitead odd, I thought that defaulting to empty objects would prevent that I'll look into it.