Libbum / joomla-papers

ORCiD feed to papers feed for Joomla
GNU General Public License v3.0
1 stars 1 forks source link

Exploit multiple sources to optimise data coverage #6

Closed Libbum closed 6 years ago

Libbum commented 6 years ago

The current implementation takes only the zeroth index version of a works summary—this is the user's preferred source.

If your professor is lazy and hasn't sanitised their inputs, this is going to open the possibility of data loss that could be tracked by keeping this info.

Opening up this section to all values complicates things in the sense that we will send multiple putcode requests for identical works that have no guarantee of staying next to each other. More sorting may be needed or perhaps a more complex call condition.

For example, save all putcodes against dois, but just request the first. When printing, if there is missing data, call the alternative putcodes iff that occurs.

Libbum commented 6 years ago

Production problems

New titles, recently released

Andy (older than 2018)

Libbum commented 6 years ago

Can confirm that all of these have a 'preferred source' as the individual, rather than coming from somewhere else (e.g. crossref or scopus). This would suggest that these items have been put into the OrcID database manually when the larger databases are taking ages to spider information across to OrcID.

So the reason for the missing data makes sense, since you can't actually put all of the required information into an OrcID entry—we must take a lot directly from the bibtex.

This at least gives us an identifier to spawn off extra checks.

Libbum commented 6 years ago

Andy's old papers may make sense with this argument, but the newer ones must have something else amiss.

Take Optical vector network analysis of ultranarrow transitions in 166Er3+:7LiYF4 crystal as an example. It currently has no external tracking, meaning Jared's entry is the only source for it, with a putcode: 43414797.

A subset of relevant results:

"journal-title": {
    "value": "Optics Letters"
},
"short-description": null,
"citation": {
    "citation-type": "BIBTEX",
    "citation-value": "@article{10.1364/OL.43.000935, \nauthor= {Kukharchyk, N. and Sholokhov, D. and Morozov, O. and Korableva, S.L. and Cole, J.H. and Kalachev, A.A. and Bushev, P.A.}, \ntitle= {Optical vector network analysis of ultranarrow transitions in<sup>166</sup>Er<sup>3+</sup>:<sup>7</sup>LiYF<sub>4</sub> crystal}, \njournal= {Optics Letters}, \nvolume= {43}, \nnumber= {4}, \npages= {935-938}, \nyear= {2018}}"
},
"type": "JOURNAL_ARTICLE",
"publication-date": {
    "year": {
        "value": "2018"
    },
    "month": null,
    "day": null,
    "media-type": null
},

We should be able to pull out correct details for this item but seem to be failing at present. That must be the regex checkers. So I'll take a look there.

Libbum commented 6 years ago

8 Fixes all of the New titles, recently released section above.

There are only the 3 issues concerning Andy's older entries now.

The problem here is his citations are not in bibtex format:

"citation": {
    "citation-type": "FORMATTED_UNSPECIFIED",
    "citation-value": "Quach, J, Su, C, Martin, A & Greentree, A, 2012, 'Domain structures in quantum graphity', <i>Physical Review D</i>, vol. 193, no. 4, p. 593."
},

Since Jared and Salvy are putting in their bibtex values when using their manual, individual 'preferred source' entries, I think it's best to not support this choice and get Andy to update the problematic entries.

Libbum commented 6 years ago

Jared is fine with the current implementation. I take that as a win for us :smirk: Closing this for now.