citation-file-format / cffconvert

Command line program to validate and convert CITATION.cff files.
Apache License 2.0
114 stars 30 forks source link

Inconsistency between how `cffconvert` renders `url` and the GitHub widget #397

Open alexlancaster opened 2 days ago

alexlancaster commented 2 days ago

If we use the CITATION.cff: https://github.com/citation-file-format/cffconvert/blob/main/tests/lib/cff_1_3_0/urls/IRACU/CITATION.cff

authors: 
  - name: "Test author"
cff-version: "1.3.0"
identifiers:
  - description: "The URL"
    type: url
    value: "https://github.com/the-url-from-identifiers"
message: "Test message"
repository: "https://github.com/the-url-from-repository"
repository-artifact: "https://github.com/the-url-from-repository-artifact"
repository-code: "https://github.com/the-url-from-repository-code"
title: "Test title"
url: "https://github.com/the-url-from-url"

The expected output from cffconvert prioritizes the url in identifiers, e.g. the apalike test output (https://github.com/citation-file-format/cffconvert/blob/main/tests/lib/cff_1_3_0/urls/IRACU/apalike.txt) is:

Test author Test title URL: https://github.com/the-url-from-identifiers

However if I put that same .cff file on a repo, e.g.: https://github.com/alexlancaster/cffconvert_test_cffs

and click the "Cite this repository" the output is:

Test author. Test title [Computer software]. https://github.com/the-url-from-repository-code

This seems inconsistent to me. I'm aware that the GitHub widget uses the ruby backend, but I would expect that used in the default way that they would be the same, and has implications for what end-users should use for the default url, because cffconvert is often used internally to generate citations that would be different from GitHub.

My sense about what should be prioritized is actually the basic url field, because it's a "top-level" field, and I had assumed that would take precedence since you can have multiple url in identifers that have different relations to the the thing being documented.

E.g. I use it in my own repo: https://github.com/alexlancaster/pypop/blob/citation-cff-zenodo/CITATION.cff to note identifiers like GitHub tags and PyPI repo:

- description: GitHub tag for repository
    type: url
    value: https://github.com/alexlancaster/pypop/tree/v0.9.167
    relation: IsSupplementTo
  - description: PyPI package
    type: url
    value: https://pypi.org/project/pypop-genomics/0.9.167
    relation: IsSourceOf

and I also use cffconvert to convert this to .zenodo.json which are recognized during the upload. What ends up happening is a skew what is shown as the URL.

In contrast, the top-level url is more unambigously about the current entity.

Not sure what the right solution here is, but this inconsistency should be addressed.

alexlancaster commented 2 days ago

In addition there is an inconsistency with handling of doi.

The GitHub widget will the DOI and display it in the widget if it is at the top-level. But if it's in identifiers it will ignore it completely. Whereas cffconvert will ignore the top-level doi and only use the first doi it finds in identifiers

jspaaks commented 1 day ago

Hi Alex, thanks for reporting this issue.

I agree that it would be helpful for users to have consistency across tooling. In fact I've advocated for it, and proposed a mechanism for how to implement it, here: https://github.com/citation-file-format/citation-file-format/issues/330#issuecomment-925849004

Unfortunately, I expect it might be a while before CFF can achieve consistency across its ecosystem of tools. I feel this is ultimately a funding issue.

Best, Jurriaan

PS ping @sdruskat @hainesr