Open Futur3r opened 1 year ago
Or, the easy way, when Cita as fetched the QID of a zotero item, it just change the URL value of the item with the one of the P953. That way there is no need to modify the existing PDF scraper of zotero.
Also, an option "Fetch Open Access URL" could be added in the menus.
This looks like how the attachment is added
Basically we just give a URL to the PDF and Zotero should do the rest.
Adding this as a new function should be easy enough. I'd have to check how easy/hard it is to integrate a new PDF provider into Zotero's "PDF finder".
Is there any way we can quantify the rough number of items (for scholarly articles) that have P953 but Zotero won't already find a PDF for them? Or at least what proportion of scholarly items on WD have P953? Like, is this change likely to find a lot of PDFs that wouldn't already be found?
And I think the Zotero function for the PDF scraper is this one.
I started to code an option in the items submenu of the library (the easy way). I am adding the option "fetch Open Access urls". It maybe is redundant, but easier to code for me. I'll make a PR.
There is currently 2 564 303 WD elements with a P953 statement. The best query would be this one but it times out. I got 1 351 783 just for scholarly articles that have a P953. The total number of scholarly articles on WD is 38 856 462.
And the data in WD is constantly increasing, easy to check and contribute by human, so it will only go up. Also, WD is kind of the only way to do this for any scientific work, anywhere on the web. I frequently find articles, booksections, ... behind paywalls but available in ResearchGate or HAL.
I've heard that some years ago, the EU passed a law that authorize European researchers to publish the manuscript of their papers anywhere they want, 6 month after the date of publication in a journal.
So this functionality can be quite handy.
Wikidata can function as a hub to automatically find an open access PDF version for a Zotero item if it as a QID.
Describe the solution you'd like If the Zotero PDF scraper doesn't find any open access PDF of an article on a webpage, Cita could fetch an open access URL of this article via the property P953 of a wikidata element, if available, and give it back to the Zotero PDF scraper for an automatic second try.
For this task, the Hub could be used, by building this kind of URL for example:
https://hub.toolforge.org/[QID]?property=P953
The Hub would return the value of P953, for example with the element Q114149071 -> test. Note: if the property doesn't exist, the Hub returns the URL of the element on wikidata.org (test), so a simple if statement would be needed to check if the P953 of an element exist.The automatic way:
Note: to not over-complicate things for the user, if Cita doesn't find the QID on the first try, do not prompt any error, maybe just a debug().
The semi-automatic way:
Note: the user would have enabled this functionality of auto-scraping the PDF via wikidata, in the Cita preferences (the functionality would probably be enabled by default)
The manual way:
Note: some P953 doesn't reference a PDF but webpage with text as this one, maybe in that case Cita would open the page on browser (like for QuickStatement) or a snapshot of the webpage could be made. It's maybe something that could be added directly in the translator of this website, I don't know ..? Also, do the Zotero PDF scraper need the URL of the PDF directly, or does it use a translator to find the URL of the PDF on a webpage ?
The zotero-scihub add-on implements similar functionalities.