JabRef / jabref

Graphical Java application for managing BibTeX and biblatex (.bib) databases
https://devdocs.jabref.org
MIT License
3.59k stars 2.54k forks source link

Add Cleanup for copying over physcial review article id as the page number (revtex) #7019

Closed Siedlerchr closed 3 years ago

Siedlerchr commented 4 years ago

Background, Request from the forum:

I import papers from the APS Physical Review journals using the “Library -> New Entry -> DOI” feature. It looks something like this:


@Article{Stenzel2020,
author = {L. Stenzel and A. L. C. Hayward and U. Schollwöck and F. Heidrich-Meisner},
journal = {Physical Review A},
title = {Topological phases in the Fermi-Hofstadter-Hubbard model on hybrid-space ladders},
year = {2020},
month = {aug},
number = {2},
volume = {102},
doi = {10.1103/physreva.102.023315},
publisher = {American Physical Society ({APS})},
}

>Notice that there is no page number. But it is standard for APS journals to set the page number to the article ID, here “023315”. In fact, setting the page number to the article ID is necessary to get the right output from the APS Revtex document class in LaTeX.

>I was wondering if it would be possible or desirable to add back the page number in Physical Review imports. It would make life much easier for everyone who uses APS journals and Revtex.

> Thank you.
-------
Idea for solution: Implement a CleanUp, simliar to the DOI Cleanup: https://github.com/JabRef/jabref/blob/master/src/main/java/org/jabref/logic/cleanup/DoiCleanup.java

1. Access the BibEntry object
2. Check for existence of field DOI
3. extract the article id from it
4. set the value of the Field Page to the extracted article id

That should be sufficient and relatively easy to implement. Of corse otherwise the action/class has to be added to the list of possible cleanups.
sambo57u commented 3 years ago

We need this badly....I wish I could program in java but can't. I have been suffering from this for a long time and have to do it by hand after every import.

BenjaminDAnjou commented 3 years ago

Note that there is a subtlety here. The Physical Review journals changed their article numbering at the end of the 20th century. For instance:

https://journals.aps.org/pra/edannounce/PRAv60i4.html https://journals.aps.org/prb/edannounce/PRBv62i17.html

Before, they had normal page numbers. They then moved to electronic article IDs.

So I think it's really necessary to check if a page number already exists before executing your suggestion.

tmrd993 commented 3 years ago

Hello I'd like to work on this if that's okay.

sambo57u commented 3 years ago

That would be great! (cok memnun olurum!)

Siedlerchr commented 3 years ago

@tmrd993 go ahead, just create a PR withyour changes

tmrd993 commented 3 years ago

@Siedlerchr Okay, I'll probably have a PR ready by tomorrow at the earliest.

I looked around and I'm not 100% sure where the article id is found for every case. For example, in the case above, it's the last number and it's seperated from the preceding string with a dot. However in this case: 10.1093/ajae/aaq063 There is no dot to seperate the id. Is the id only composed of digits? Is it aaq063 or 063?

According to https://en.wikipedia.org/wiki/Digital_object_identifier, The DOI is composed of a prefix and a suffix like this PREFIX/SUFFIX and the suffix is called item id and identifies a single object. So can I assume that the article id is contained in the suffix and is the last encountered digit sequence?

For example, here 10.1103/physreva.102.023315 it is 023315 and here, 10.1371/journal.pgen.1001111 it's 1001111

Siedlerchr commented 3 years ago

@tmrd993 No need to hurry. I asked the people in the forum if they can help you answering your question. Sponatenously, I would have guessed the last encountered digit sequence but no idea if it can be alphanumeric as well.

sambo57u commented 3 years ago

Yes, for APS journals (Physical Review A,B,C, etc) the id is the numerical stuff after the last dot in doi. In citation this is cited as the page number in the official citation. Only when pulled from doi.org it does not pull the page.

The others may be working correctly when pulled. So far the only journals I had this happen are the APS journals.

BenjaminDAnjou commented 3 years ago

I confirm, the article ID is the number after the last dot in the DOI. That's what should be put in the page number field.

The general format of APS doi is https://doi.org/10.1103/[journal].[volume].[articleID].

As I said before, the page number should not be changed if it already exists. There were proper page numbers in APS journals before around 1998-1999.

Best

BenjaminDAnjou commented 3 years ago

Actually I just double checked and even old articles use the number of the first page as article ID in the DOI. So if you take the number after the last dot in the DOI, it's either the first page (for old articles) or the article ID (for new articles).

Nowadays, people usually only put the first page in the citation. So if you put the number after the last dot in the DOI as page number, that would be enough for most purposes.

Best

alshehab211 commented 3 years ago

I also confirm that for all APS journals the page number is the last number after the last dot in the DOI. There is only one exception when the article is Rapid (e.g Physical Review B: Rapid Communications) then we need to add (R) after the page number which I think we will have to do it manually since it is not contained in the DOI. For example this article https://doi.org/10.1103/PhysRevB.102.081104 the page number is 081104, but since it is Rapid we have to cite the page number as 081104(R).

tobiasdiez commented 3 years ago

In addition to the great PR from @tmrd993 that implements this, could you (@alshehab211 @sambo57u @BenjaminDAnjou) please also contact APS and/or Crossref and notify them that the information that they provide is incorrect/incomplete. Thanks!

alshehab211 commented 3 years ago

@tobiasdiez That is a great idea. I will contact them regarding the issue, Thanks a lot for the effort, @Siedlerchr , @tobiasdiez , and @tmrd993

BenjaminDAnjou commented 3 years ago

I would gladly contact them. But just to make sure I get this right, could you tell me exactly what I need to inform them about? What information do they not provide and more importantly where do they fail to provide it?

I am not sure I understand the issue completely on a technical level, so I'd appreciate some directions.

Best

Siedlerchr commented 3 years ago

They should simply provide the article ID in the field "pages" in the bibtex data that is returned when acessing the DOI with the "application/x-bibtex" HTTP Accept header. Background: JabRef calls for example the url dx.doi.org/10.123456 with a specifid HTTP reader that indicates that we want to get bibtex data back. So it would be the easiest if the response from the server contains the "pages" field.

sambo57u commented 3 years ago

I have contacted the APS about this issue two years ago. I got a response saying they are working on it as a high priority issue, but nothing happened. It is best to fix it assuming that APS will not do anything.

BenjaminDAnjou commented 3 years ago

I have contacted them Hopefully getting many messages will get them to think about it again.

tobiasdiez commented 3 years ago

Thanks to @tmrd993 this is now implemented in the latest development version. Could you please check the build from http://builds.jabref.org/master/. Thanks! Please remember to make a backup of your library before trying-out this version.

Please let us know if you here something from APS or crossref.

alshehab211 commented 3 years ago

@Siedlerchr @tobiasdiez @tmrd993 Thanks a lot for your help! I will check it and let you know

alshehab211 commented 3 years ago

It is perfectly working now! Thanks again