NCEAS / metacatui

MetacatUI: A client-side web interface for DataONE data repositories
https://nceas.github.io/metacatui
Apache License 2.0
42 stars 28 forks source link

When an organization is an author, APA citations do not display the org. name properly. #2106

Open robyngit opened 1 year ago

robyngit commented 1 year ago

Describe the bug

When Citations are displayed in APA format, organization names are not displayed correctly. They are abbreviated like individual names. For example, Multi-Agency Rocky Intertidal Network (MARINe) is displayed as (MARINe), M.R.I.N..

To Reproduce Steps to reproduce the behavior:

  1. Currently APA citations are only displayed on the develop branch, so run a local MetacatUI using the develop branch.
  2. Configure it to use the DataONE theme in production
  3. Go to this dataset: {YOUR LOCAL URL}/view/doi%3A10.6085%2FAA%2FIPOGXX_XXXITV2XMSR01_20180518.50.1
  4. Click on the "Cite this dataset" button.
  5. Compare the organization names in the citation popup vs. the page header.

Expected behavior Organization names should be displayed in full. Multi-Agency Rocky Intertidal Network (MARINe) should be displayed as Multi-Agency Rocky Intertidal Network (MARINe) (or perhaps ideally without the (MARINe) abbreviation).

Screenshots

In the dataset landing page citation popup: Screen Shot 2023-03-14 at 13 24 16

In the search results view: Screen Shot 2023-03-14 at 13 25 47

In the Metadata Assessment Report:

Screen Shot 2023-03-14 at 13 26 48

Additional context

In PR #2095, we changed the CitationView to render citations in standard APA format. In APA citations, the authors are listed with the family name first, followed by the initials of the given name. For example: Pike Spector becomes Spector, P.. The CitationView can render a Citation from different models, including EMLModels and SolrResult models.

When rendering from SolrResults, the view only has access to the authors from the Solr "origin" array. This gives a list of author strings, e.g. ["Multi-Agency Rocky Intertidal Network (MARINe)", "Partnership Interdisciplinary Studies Coastal Oceans for of (PISCO)", and "Pike Spector"]. We parse this string to determine which part of the name is the given name and which part is the family name (see nameStrToCSLJSON). The issue arises when we have an author that is an organization like Multi-Agency Rocky Intertidal Network (MARINe). Since we get just a string from Solr, the organization in APA format is rendered as (MARINe), M.R.I.N., because (MARINe) is identified as the family name, and the rest as given names.

This bug shows up when we are rendering a CitationView from SolrResults and the citation includes an organization. This will likely also occur in the CitationList in portals when there is an organization, since the metrics service also returns authors as strings. So parts of MetacatUI that are affected by this bug include:

To fix this within MetacatUI, we would need to download and parse the associated EML document, where there is one. We don't actually do this in the MetadataView. Instead, we use the viewService to get the HTML that displays the metadata and insert it directly into the view. Downloading each EML document in the Search results view would really slow down the app. Since the metrics service returns information about external documents, we don't really have a way to get the origin information parsed by name/organization from within MetacatUI, as far as I know.

Here is what I propose to fix this:

  1. The Citation popup on the Dataset landing pages: Either parse the html response from the viewService, or get and parse the EML, in order to get EMLParty models for each author.
  2. The metadata assessment report: Use the CitationHeader like we do on the dataset landing pages.
  3. The metrics service citation lists: If it's possible, update the metrics service to return the author names as CSL JSON?
  4. The DataCatalog: Create a new citation-based view for each search result item. Use author full names for this view. This would be similar to how we created a special view for the header of dataset landing pages. This would be more in line with search results in online science publications, which typically don't show citation, but instead show a formatted results list. For example:

Screen Shot 2023-03-14 at 13 50 51 Screen Shot 2023-03-14 at 13 50 11 Screen Shot 2023-03-14 at 13 48 54

Alternative solutions:

mbjones commented 1 year ago

All good points. Let's discuss.

Organization names have been problematic for quite some time, so let's not let this longstanding problem delay our rollout of support for multiple authors.

If switching to APA author format is partly causing this change, can we just go back to the old "as is" format until we can work out the organization issue? This would allow both full author lists and unmangled organization names, but would result in much longer citation strings.

robyngit commented 1 year ago

Thanks @mbjones, this exactly is the feedback I needed to move forward. That's a good idea, I will just revert partially back to the old format and always show complete author names for now, and go ahead with the release. We can come back to this issue afterwards if that all sounds good to you!

mbjones commented 1 year ago

Sounds like a plan to me.