NCEAS / metacatui

MetacatUI: A client-side web interface for DataONE data repositories
https://nceas.github.io/metacatui
Apache License 2.0
41 stars 27 forks source link

Fix citation metadata export consistency issues #1543

Open csjx opened 3 years ago

csjx commented 3 years ago

Describe the bug Charu pointed out that citations that are imported into Paperpile are not consistent with the citations found on the metadata landing pages at the ESSDIVE repository. These issues could be due to 1) MetacatUI not populating the Highwire Press fields correctly, or 2) Paperpile truncating or reordering field values, or 3) the ESSDIVE theme overriding the CitationView functionality.

  1. Author list missing authors: It looks like we are populating <meta name="citation_author"> with the list generated from MetadataView.getAuthorsText() by evaluating the length and content of the model.attributes.origin field in the Solr results. With two authors, we add an and, with three or more, we separate them with a ,, and above five, we append et al.. This works for the display, but doesn't work for the citation manager import. It should have the full list. The Google guidance says to:

    Put each author name in a separate tag and omit all affiliations, degrees, certifications, etc., from this field.

    whereas we are adding them all into one <meta> tag.

    We should be populating multiple <meta name="citation_author" tags.

  2. Author ordering is incorrect: While the origin field tends to have the order correct with the first author first in the list, Paperpile seems to extract the author names and omit the punctuation, and then order them alphabetically. I wonder if this is because of us not populating individual <meta> tags in (1) above. Try populating individual citation_author fields, and see if Paperpile orders them in document order.

  3. Publisher does not match the displayed citation: The MetadataView.getPublisherText() function defaults to the model.attributes.datasource Solr field and then looks up the repository node.name with that information. This is an assumption made by MetacatUI (that the repository is the publisher), but ESSDIVE overrides this with the subproject name. In creating the <meta name="citation_publisher" tag, we need a single source of information for this. I suggest we populate a model.attributes.publisher field and always pull from that location. If ESSDIVE overrides it, the <meta> tag will still be populated correctly.

  4. Publisher text is truncated: Paperpile seems to just truncate long citation_publisher fields with an ellipsis:

    Environmental System Science Data Infrastructure for a Virtual Ecosystem; Development of a molecularly informed biogeochemical framework for reactive transport modeling of subsurface carbon inventories, transformations and fluxes

    becomes

    Environmental System Science Data Infrastructure for a Virtual Ecosystem ...

    I'm not sure we can fix this - it might be a Paperpile-specific issue. Check if others like Zotero do this.

To Reproduce Steps to reproduce the behavior: Visit the following landing pages and import them into Paperpile using the Chrome extension:

  1. https://doi.org/10.21952/WTR/1412542
  2. https://doi.org/10.15485/1660455
  3. https://www.osti.gov/biblio/1577267
  4. https://data.ess-dive.lbl.gov/view/doi:10.21952/WTR/1506941

Expected behavior Exported citations should have the correct authors in the correct order, with the publisher as stated in the displayed citation.

Desktop (please complete the following information):

csjx commented 3 years ago

Hi @laurenwalker - I assigned this to @gothub since this relates to an ESS-DIVE issue. Can you both estimate add this to the appropriate milestone to get into a soonish release?

laurenwalker commented 3 years ago

Sure @csjx, thanks. @gothub Could you let me know about how much time this will take? We can discuss if needed. Then I can estimate what release we could get this in. Since it's a bug fix, we could just work it into whatever patch release is next at the time of completion.

gothub commented 3 years ago

@laurenwalker @csjx I just started looking into this yesterday, so may have an estimate of how long it will take today or tomorrow.

gothub commented 3 years ago

@csjx for item 3. - which model should the publisher attribute be placed? The model for MetadataView is SolrResult, so that doesn't seem appropriate. Should this be an attribute for the MetadataView, or for a different model object?

laurenwalker commented 3 years ago

@gothub - You're right, the SolrResult model does not have an attribute for publisher because we don't index the publisher metadata in Solr. We would need to get the publisher name from the EML itself, which is currently not fetched in the MetadataView.

We have had a ticket open in Metacat for a couple years to index the publisher name. I think that is the most thorough solution to this issue, since dataset citations appear in other areas of the UI as well (e.g. search result list), not just the MetadataView, and we would want the citations to be consistent.

gothub commented 3 years ago

The fix in commit b99ba7beb2ce6a8fcb409cc82f394114137a94e8 changes the export of the authors list. Instead of calling getAuthorText() with the list of authors obtained from the Solr index origin field, the list is passed directly to the metaTagHighwirePress template so that each author can be output to a <meta name="citation_author"> entry. Previously all authors were placed in a single citation_authors entry.

Paperpile will correctly display all authors now, in the order that they were retrieved from the Solr field. Shown below is an example dataset (https://knb.ecoinformatics.org/view/doi%3A10.5063%2FM043S3) as it appears in MetacatUI

Screen Shot 2020-12-03 at 11 56 55 AM

... and then in Paperpile:

Screen Shot 2020-12-03 at 3 19 50 PM

And here is the origin field from Solr, which shows that the exported author list order matches the document order in Solr:

<arr name="origin">
<str>Corina Logan</str>
<str>Kelsey McCune</str>
<str>Maggie MacPherson</str>
<str>Zoe Johnson-Ulrich</str>
<str>Carolyn Rowney</str>
<str>Benjamin Seitz</str>
<str>Aaron Blaisdell</str>
<str>Dominik Deffner</str>
<str>Claudia Wascher</str>
</arr>
gothub commented 3 years ago

@laurenwalker Since the publisher portion of this fix is now dependent on an update to Solr indexing, should the authors list fix be merged into the devel branch now, or wait for the Solr index publisher fix?

gothub commented 3 years ago

Ensure that citation exporting works with:

gothub commented 3 years ago

WIth commit 0086010d750868f6b7763548f301d9d50ca3b531 the export of authors is now correct.

I'll leave this issue open as the resolution of the 'publisher' field fix is dependent on the metacat issue mentioned above.

BTW - the best references for Google Scholar / Highwire Press tags: