Open goldengrape opened 1 year ago
To download PDFs, ZoteroDB just calls pyzotero.zotero.Zotero.dump
: https://github.com/whitead/paper-qa/blob/93c47ccc7ea496a1167ac2b89e7fa512dee7be7f/paperqa/contrib/zotero.py#L112
This calls Zotero.file
internally: https://github.com/urschrei/pyzotero/blob/c61125751eaa99b5ac8d3d1e8842219fbee6dbf6/pyzotero/zotero.py#L700-L723
Can you give more details? Where is zotfile
being used? What is the specific zotero item?
You can set:
import logging
logging.basicConfig(level=logging.DEBUG)
to see more debugging information.
Haven't gotten around to fully implementing this in paperqa but it seems like the structure for files in Zotero is top files -> children. For files that are saved with Zotero directly, top files include the pdf attachment. For files that are saved with zotfile, pdfs are saved in children, rather than top level items.
@MilesCranmer currently you use Zotero.top() which should catch all files stored in Zotero; to catch other items that call needs to be changed to Zotero.items() and then downloading the PDF based on the link type. See lifan0127's implementation here: https://gist.github.com/lifan0127/e34bb0cfbf7f03dc6852fd3e80b8fb19.
Ah, sorry, I didn't understand the first question. Thanks for giving me more details, now this makes sense. Also I didn't see lifan's gist before writing the one in ZoteroDB
, that would have saved me quite a few days of debugging! :sweat_smile:
Do you want to start a draft PR to fix this? Happy to help get things all working.
I gave up on the zotero part, and I now find that it's actually perfectly fine to get the paper directly from the web and then ask questions with paperqa. Storing papers locally for a long time is not necessary.
Hm.. am I right in thinking that if a user exhausts their 300 MB Zotero cloud storage than the files that are outside the limit wont get synced as consequently won't get zoteroDB.iterate'ed?
If zotfile is used, the pdf file cannot be found. This is of course to blame on pyzotero.