JabRef / jabref

Graphical Java application for managing BibTeX and biblatex (.bib) databases
https://devdocs.jabref.org
MIT License
3.54k stars 2.48k forks source link

Get fulltext PDF from DOI link #4383

Closed alfureu closed 5 years ago

alfureu commented 5 years ago

Most of my articles in my bibliography entry parse only the DOI links. Would it be possible to look up automatically from the DOI number the URL of the article and possibly download the PDF with one click? It would make life so much easier ...

jabref

Siedlerchr commented 5 years ago

I don't really get the thing with the URL. The doi actually is an URL. Regarding PDF download, you can use the keyboard shortcut to lookup fulltext document. It's not guaranteed that you get the fulltext for every article. JabRef tries several resources to find and download. The problem is that the article websites differ from journal to journal.

alfureu commented 5 years ago

Hi @Siedlerchr, thanks for your response. The problem is that for the following conditions:

  1. no URL is given for an article
  2. DOI is given

there is no PDF downloaded, whatever I do. It would be great to have covered the major databases, e.g. Scopus, SpringerLink, Web of Knowlegde, etc. I have a rather large database, and JabRef is literally unable to download almost any of the PDFs. I think it is a problem.

My suggestion was: use the DOI to parse the missing URL, and then from the URL the PDF-link should be extracted. At the moment JabRef is not able to extract any PDF, except the direct link to the .pdf file itself - which is not good enough.

stefan-kolb commented 5 years ago

Currently, JabRef tries to extract the PDF fulltext link from the website behind the DOI and many other sources. If it doesn't work then you don't have access to these articles most of the times.

What are you expecting except froim that behavior? Should JabRef write the PDF URL in the URL field so you can retrieve the PDF by yourself?

alfureu commented 5 years ago

Yes, it is one of the solutions.

However, often I do have access, JabRef is just unable to parse the PDF through a browser, which has an active cookie through my institutional library. Hence the ticket #4382, where EZProxy integration might solve this issue easily.

In addition, often the paper is Open Access, however, JabRef still fails to download the freely available PDF. It might be somehow that the parsing through DOI is not leading to a paper. I believe it was happening mostly with ScienceDirect articles, where you have to click on Get Access and the page newly provides 2 additional options for PDF-download or Annotation. I just wonder whether a better integration with ResearchGate, Academia.edu or other social networks would provide better results.

Siedlerchr commented 5 years ago

JabRef includes the following fulltext fetchers and queries them for a fulltext of a pdf. We do had/have sometimes had troubles with ScienceDirect.

https://github.com/JabRef/jabref/blob/611e35a6b8f3032bd9e87b6b09c341f414683eac/src/main/java/org/jabref/logic/importer/WebFetchers.java

    List<FulltextFetcher> fetchers = new ArrayList<>();
        // Original
        fetchers.add(new DoiResolution());
        // Publishers
        fetchers.add(new ScienceDirect());
        fetchers.add(new SpringerLink());
        fetchers.add(new ACS());
        fetchers.add(new ArXiv(importFormatPreferences));
        fetchers.add(new IEEE(importFormatPreferences));
        // Meta search
        fetchers.add(new GoogleScholar(importFormatPreferences));
        fetchers.add(new OpenAccessDoi());
stefan-kolb commented 5 years ago

I think we need specific examples here, if maybe one of our fetchers can be improved.

alfureu commented 5 years ago

Excellent, but this list can be further expanded as it is clearly not sufficient, as highlighted earlier. Especially with scientific social networks like ResearchGate or Academia.edu, where individual researchers upload their manuscripts anyway.

stefan-kolb commented 5 years ago

ReserachGate and Academia.edu should be covered by GoogleScholar. If the manuscript is available anywhere for free GoogleScholar should pick it up.

abougouffa commented 5 years ago

I think that it is completely possible to add SciHub as a source for fulltext download, it is as simple as getting the pdf from https://sci-hub.tw/{DOI}, sometimes; the page may display a CAPTCHA <img> that can be selected by the id = "captcha" in the html page, and then displayed to the user to get the captcha answer and then download the PDF!

lanzen commented 3 years ago

Did anyone manage to implement this?

Siedlerchr commented 3 years ago

@lanzen Regarding sci-hub. We won't implement this as we don't want to get in trouble with the publishers.

lanzen commented 3 years ago

Very understandable (and in retrospect I should have realised before even asking). Thank you for your answer!

On Thu, 26 Nov 2020, 18:23 Christoph, notifications@github.com wrote:

@lanzen https://github.com/lanzen Regarding sci-hub. We won't implement this as we don't want to get in trouble with the publishers.

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/JabRef/jabref/issues/4383#issuecomment-734414935, or unsubscribe https://github.com/notifications/unsubscribe-auth/ABYKIUIIJTD55Y5EOSPZRY3SR2FJLANCNFSM4F5VCKWQ .