Langenscheiss / bibitnow

Site adjustors for browser plugin "bibitnow"
100 stars 16 forks source link

Authors sometimes reported multiple times #59

Open episanty opened 1 year ago

episanty commented 1 year ago

I found this recently: for the paper

https://opg.optica.org/optica/fulltext.cfm?uri=optica-8-10-1243&id=459914&ibsearch=false

the plugin produces the following output:

@article{Ayuso2021,
  author = {Ayuso, David and Ayuso, David 
            and Ordonez, Andres F. and Ordonez, Andres F. 
            and Ivanov, Misha and Ivanov, Misha and Ivanov, Misha 
            and Smirnova, Olga and Smirnova, Olga and Smirnova, Olga},
  title = {Ultrafast optical rotation in chiral molecules with ultrashort and tightly focused beams},
  journal = {Optica},
  volume = {8},
  number = {10},
  pages = {1243--1246},
  year = {2021},
  doi = {10.1364/OPTICA.423618}
}

The first two authors are reported twice, and the PIs are reported three times each. I really like all of those people, but I think a single copy of each is all that's needed =).

episanty commented 1 year ago

Ah. Looking at the source, it's quite clear what's happening:

<meta name="citation_author" content="David Ayuso">
<meta name="citation_author_institution" content="Department of Physics, Imperial College London, London, UK">
<meta name="citation_author" content="David Ayuso">
<meta name="citation_author_institution" content="Max-Born-Institut, Berlin, Germany">
<meta name="citation_author" content="Andres F. Ordonez">
<meta name="citation_author_institution" content="Max-Born-Institut, Berlin, Germany">
<meta name="citation_author" content="Andres F. Ordonez">
<meta name="citation_author_institution" content="ICFO-Institut de Ciencies Fotoniques, The Barcelona Institute of Science and Technology, Castelldefels, Spain">
<meta name="citation_author" content="Misha Ivanov">
<meta name="citation_author_institution" content="Department of Physics, Imperial College London, London, UK">
<meta name="citation_author" content="Misha Ivanov">
<meta name="citation_author_institution" content="Max-Born-Institut, Berlin, Germany">
<meta name="citation_author" content="Misha Ivanov">
<meta name="citation_author_institution" content="Institute für Physik, Humboldt-Universität zu Berlin, Berlin, Germany">
<meta name="citation_author" content="Olga Smirnova">
<meta name="citation_author_institution" content="Max-Born-Institut, Berlin, Germany">
<meta name="citation_author" content="Olga Smirnova">
<meta name="citation_author_institution" content="Technische Universität Berlin, Berlin, Germany">
<meta name="citation_author" content="Olga Smirnova">
<meta name="citation_author_institution" content="e-mail: olga.smirnova@mbi-berlin.de (olga.smirnova@mbi-berlin.de)">

i.e., the Optica metadata is reporting each author multiple times -- one per reported institution. I'm not sure whether this is the actual place where the plugin is taking the information, but it feels reasonable. I also don't know how common this type of reporting would be.

I would naively suggest merging the author list, but this doesn't feel right -- I'm sure there are papers out there where authors share the same initial and surname (or even share the same full name), and the plugin should return the repetition for those.

episanty commented 1 year ago

On that last point, I've asked on Stack Exchange; let's see if anybody produces something useful there which can be helpful when debugging any relevant code.

danielhatton commented 1 year ago

Came here from SE... Can you use the dc.creator fields instead of the citation_author fields? They're in a weird order, but the weirdness of the order might be in some way consistent across the site...