IQSS / dataverse

Open source research data repository software
http://dataverse.org
Other
882 stars 494 forks source link

Bug: BibTeX + EndNote XML citation output for dataset with Permalink #10769

Open vera opened 3 months ago

vera commented 3 months ago

What steps does it take to reproduce the issue?

  1. Create dataset with permalink PID (in my example, the permalink of my dataset is https://clinicaltrials.gov/study/NCT00080262)
  2. Open dataset page and click "Cite Dataset" > "BibTeX" or "EndNote XML"

Two problems:

  1. I'm seeing weird output in the BibTeX output in L1 and line doi (missing http and extra slash after http).

    In the EndNote XML output, there is also an extra slash in <electronic-resource-num>.

    I briefly checked the code (BibTeX, EndNote XML) and I'm not sure why?

    The RIS citation is fine.

  2. in the BibTeX output, the permalink should not be given as doi since it's not a DOI

BibTeX:

@data{s://clinicaltrials.gov/study/NCT00080262_2024,
author = {$AUTHORS},
publisher = {Root},
title = {{$TITLE}},
year = {2024},
version = {V1},
doi = {http/s://clinicaltrials.gov/study/NCT00080262},
url = {http://localhost:8080/citation?persistentId=perma:https://clinicaltrials.gov/study/NCT00080262}
}

EndNote XML:

<?xml version='1.0' encoding='UTF-8'?><xml><records><record><ref-type name="Dataset">59</ref-type><contributors><authors>...</authors></contributors><titles><title>...</title></titles><section>...</section><dates><year>...</year></dates><edition>...</edition><publisher>...</publisher><urls><related-urls><url>http://localhost:8080/citation?persistentId=perma:https://clinicaltrials.gov/study/NCT00080262</url></related-urls></urls><electronic-resource-num>perma/http/s://clinicaltrials.gov/study/NCT00080262</electronic-resource-num></record></records></xml>

Which version of Dataverse are you using?

6.2

Any related open or closed issues to this bug report?

not aware

Screenshots:

-

Are you thinking about creating a pull request for this issue?

yes, would be interested

qqmyers commented 3 months ago

It looks like the citation code is assuming a / as a separator rather than using PidProvider specific code to create the entries. The specific issue of the / being 4 characters in is from using an unmanaged permalink. Because permalinks don't require a separator, there is no reliable way to tell the authority from the shoulder, so the code picks the first four chars as the authority.

johannes-darms commented 3 months ago

@qqmyers Should we update the code to use the PIDProvider specific properties and create a PR?

qqmyers commented 3 months ago

I haven't looked at the code to be certain, but I think that makes sense. The GlobalId class has methods to get whatever form or part of a PID you want, so I think at this point, there shouldn't be core code outside that class hardcoding the protocol name or trying to parse/generate a PID for display.

vera commented 3 months ago

It looks like the citation code is assuming a / as a separator rather than using PidProvider specific code to create the entries. The specific issue of the / being 4 characters in is from using an unmanaged permalink. Because permalinks don't require a separator, there is no reliable way to tell the authority from the shoulder, so the code picks the first four chars as the authority.

I see, that makes sense.

For completeness, here's what the export looks like with a managed Permalink:

BibTeX:

@data{NCT00080262_2024,
author = {$AUTHORS},
publisher = {Root},
title = {{$TITLE}},
year = {2024},
version = {V1},
doi = {https://clinicaltrials.gov/study//NCT00080262},
url = {http://localhost:8080/citation?persistentId=perma:https://clinicaltrials.gov/study/NCT00080262}
}

-> L1 seems fine, but the doi property has an extra slash in a different position (before the unique part of the Permalink)

EndNote XML:

<electronic-resource-num>perma/https://clinicaltrials.gov/study//NCT00080262</electronic-resource-num>

-> same issue (extra slash before the unique part of the Permalink)

RIS citation is still fine.

qqmyers commented 3 months ago

Cool. I see https://github.com/IQSS/dataverse/blob/b67d732921a3e84d4450a5ee18790aeab07afaed/src/main/java/edu/harvard/iq/dataverse/DataCitation.java#L295-L298 which is where the hardcoded doi and / come from. I'm not sure what BibTeX allows for non-DOIs - looks like url is an option according to https://www.bibtex.com/g/bibtex-format/.

pdurbin commented 3 months ago

Yeah, I agree, "url" sounds like a good option when "doi" isn't available.

I just checked a dataset that uses Handles ( https://data.cimmyt.org/dataset.xhtml?persistentId=hdl:11529/10016 ) and the Bibtex output includes a false DOI like this:

doi = {11529/10016},

So yeah, it would probably be good to do something here to not assume DOIs.