NaturalHistoryMuseum / scratchpads2

Scratchpads 2.0
http://scratchpads.org
GNU General Public License v2.0
199 stars 83 forks source link

Pagination missing in RIS export files #6502

Closed Archilegt closed 1 year ago

Archilegt commented 2 years ago

When trying to export a RIS file from a bibliographic reference, the content of field Pagination is not mapped into the RIS file. Example: For bibliographic reference https://myriatrix.myspecies.info/content/some-summary-data-and-metrics-myriapoda-onychophora-year-2020 a RIS file is produced when clicking on the "RIS" active text at the end of the page https://myriatrix.myspecies.info/biblio/export/ris/1408 However, the generated file does not contain the Pagination of the bibliographic reference, in this case "28-30".

In contrast, Pagination values do map correctly into BibTex files, example: https://myriatrix.myspecies.info/biblio/export/bibtex/1408

This is very concerning, especially when dealing with bulk exports and data migration, as information will be lost and may go unnoticed when using RIS file exports.

Archilegt commented 2 years ago

What is needed is that the interval recorded in "Pagination", e.g. 28-30 is splitted into the RIS fields SP (Start Page) and EP (End Page) when generating the RIS file, or if the fields SP and EP already exist somewhere, that they are called when building the file.

therobyouknow commented 2 years ago

Thank you @Archilegt Confirming I can see issue:

image

Drupal-Biblio.ris file:

TY  - JOUR
T1  - Some summary data and metrics on Myriapoda & Onychophora for the year 2020
JF  - International Society for Myriapodology Newsletter
Y1  - 2021
A1  - Carlos A. Martínez-Muñoz
VL  - 6
UR  - https://www.researchgate.net/publication/356598842
ER  - 

Biblio-Bibtex.bib file:

@article {1408,
    title = {Some summary data and metrics on Myriapoda \& Onychophora for the year 2020},
    journal = {International Society for Myriapodology Newsletter},
    volume = {6},
    year = {2021},
    month = {17/11/2021},
    pages = {28-30},
    url = {https://www.researchgate.net/publication/356598842},
    author = {Carlos A. Mart{\'\i}nez-Mu{\~n}oz}
}

From your last comment @Archilegt I would think you would want to see the Drupal-Biblio.ris file needs to look something like:

TY  - JOUR
T1  - Some summary data and metrics on Myriapoda & Onychophora for the year 2020
JF  - International Society for Myriapodology Newsletter
Y1  - 2021
A1  - Carlos A. Martínez-Muñoz
VL  - 6
UR  - https://www.researchgate.net/publication/356598842
ER  - 
SP 28
EP 30

Is this what you need?

Did this work before or has it always not worked?

Thank you.

Archilegt commented 2 years ago

Hi, Rob! I don't remember checking our RIS export files. What I use to do is to check RIS import files because recording practices vary wildly among journals, and I want to make sure that imports are not malfunctioning on our side.

About adding SP and EP to the RIS export files: 1) They should have the same format than other fields, so please aim at keeping two letters, two spaces and a hyphen, e.g.:

SP  - 28
EP  - 30

2) Lines must end with the ASCII carriage return and line feed characters. 3) Position of SP and EP tags: To keep RIS files as much human-readable as possible, I recommend placing the SP and EP tags after the volume VL and issue IS tags, e.g.:

VL  - 6
IS  - 1
SP  - 28
EP  - 30

...but except for TY - and ER -, order of tags is free and their inclusion is optional. 4) Position of ER tag: Always at the end. It means "end record". Also, when creating bulk exports of multiple records, there should be no additional blank lines between records.

We should aim at using the specifications as in the second major version (from 2011): https://web.archive.org/web/20120526103719/http://refman.com/support/risformat_intro.asp If we make any changes to the specifications, such as an expansion to allow exporting taxonomic name tags, we really need to document them.

Archilegt commented 2 years ago

@therobyouknow, could we push this forward? Page intervals in RIS export files are critical data for helping BioStor, as in https://github.com/rdmpage/biostor/issues/100

Archilegt commented 2 years ago

If we are going to fix this pagination issue, the DO mapping issue #6160, and even problems with importing journal titles and exporting keywords, then I need full access to admin/config/content/biblio, so that I can work with the publication types and mappings. And we need a parser for machine biblio_pages, for breaking the interval into two new machines, biblio_start_page and biblio_end_page, which we will then use through mapping to populate the RIS export SP and EP fields.

Archilegt commented 2 years ago

There is: Context -> solr -> biblio_search_page (Biblio search page) The effect of that "context" and the related tool/widget can be seen in the Myriatrix Literature database as the column "Page". The rationale of creating that field and tool was to sort publications by their starting page, which is useful when several references come from the same volume and issue. In summary, biblio_search_page, visualized as "Page", does what the RIS field "Start Page - SP" is meant to do. The field is visible on the Edit Biblio overlay -> Publication tab -> Start Page field. This field could be repurposed to map to the RIS SP field but if the machine name is biblio_search_page instead of biblio_start_page, that's simply not intuitive. Better to create the biblio_start_page field and then repurpose it for the Biblio search page tool/widget. Still there is no equivalent to biblio_end_page, so it needs to be created in any case. Manual data input is envisaged to continue being an important data input for literature. Copy-paste will most likely be used by data recorders. As a result, most (all?) page intervals will be more easily copy-pasted as "xxx-yyy" into the Pagination field. Recording burden is kept to a minimum if the Pagination field continues to exist, with a parser automatically filling SP with xxx and EP with yyy.