bio-guoda / preston

a biodiversity dataset tracker
MIT License
24 stars 1 forks source link

found malformed Zenodo doi in Plazi mediated DwC-A #99

Closed jhpoelen closed 3 years ago

jhpoelen commented 3 years ago

as I was building a doi registry for all of everything, I happen to notice a malformed zenodo doi http://doi.org/10.5281/zenodo./1125301 hidden in a multimedia.txt table of hash://sha256/1ae87b40107c13e48e68684ed201dfaa2480ac740bd37ed8dced033c75a3e322 . The expected doi would be http://doi.org/10.5281/zenodo.1125301 (no last slash)

This is a silly example, but shows the power of these crawling / indexing tools based on a versioned data corpus.

@mielliott matching / access tools are working great so far . . . still eager to work

$ curl -s 'https://preston.guoda.bio/cat/zip:hash://sha256/1ae87b40107c13e48e68684ed201dfaa2480ac740bd37ed8dced033c75a3e322!/multimedia.txt' | grep 1125301
F54487B2FF85064560F9FC25FAA9295E.taxon  http://purl.org/dc/dcmitype/StillImage  image/png   https://zenodo.org/record/1125301/files/figure.png  http://doi.org/10.5281/zenodo./1125301  FIGURE17. Primary type specimens of New Zealand Scirtidae.A) Holotype, Veronatus versicolor Broun, 1921; B) Holotype, Veronatusvestitus Broun, 1921; C) Paratype, Stenocyphon neozealandicus Ruta, Thorpe & Yoshitomi,2011.Scale bar = 1 mm.    FIGURE17. Primary type specimens of New Zealand Scirtidae.A) Holotype, Veronatus versicolor Broun, 1921; B) Holotype, Veronatusvestitus Broun, 1921; C) Paratype, Stenocyphon neozealandicus Ruta, Thorpe & Yoshitomi,2011.Scale bar = 1 mm.    2017-12-22  Kiałka, Agata;Ruta, Rafał       Zenodo  biologists  Kiałka, Agata;Ruta, Rafał           
F54487B2FF85064560F9FA77FA9E2B69.taxon  http://purl.org/dc/dcmitype/StillImage  image/png   https://zenodo.org/record/1125301/files/figure.png  http://doi.org/10.5281/zenodo./1125301  FIGURE17. Primary type specimens of New Zealand Scirtidae.A) Holotype, Veronatus versicolor Broun, 1921; B) Holotype, Veronatusvestitus Broun, 1921; C) Paratype, Stenocyphon neozealandicus Ruta, Thorpe & Yoshitomi,2011.Scale bar = 1 mm.    FIGURE17. Primary type specimens of New Zealand Scirtidae.A) Holotype, Veronatus versicolor Broun, 1921; B) Holotype, Veronatusvestitus Broun, 1921; C) Paratype, Stenocyphon neozealandicus Ruta, Thorpe & Yoshitomi,2011.Scale bar = 1 mm.    2017-12-22  Kiałka, Agata;Ruta, Rafał       Zenodo  biologists  Kiałka, Agata;Ruta, Rafał           
F54487B2FF84064460F9FE83FBD12F29.taxon  http://purl.org/dc/dcmitype/StillImage  image/png   https://zenodo.org/record/1125301/files/figure.png  http://doi.org/10.5281/zenodo./1125301  FIGURE17. Primary type specimens of New Zealand Scirtidae.A) Holotype, Veronatus versicolor Broun, 1921; B) Holotype, Veronatusvestitus Broun, 1921; C) Paratype, Stenocyphon neozealandicus Ruta, Thorpe & Yoshitomi,2011.Scale bar = 1 mm.    FIGURE17. Primary type specimens of New Zealand Scirtidae.A) Holotype, Veronatus versicolor Broun, 1921; B) Holotype, Veronatusvestitus Broun, 1921; C) Paratype, Stenocyphon neozealandicus Ruta, Thorpe & Yoshitomi,2011.Scale bar = 1 mm.    2017-12-22  Kiałka, Agata;Ruta, Rafał       Zenodo  biologists  Kiałka, Agata;Ruta, Rafał           
jhpoelen commented 3 years ago

here's the related eml via curl 'https://preston.guoda.bio/cat/zip:hash://sha256/1ae87b40107c13e48e68684ed201dfaa2480ac740bd37ed8dced033c75a3e322!/eml.xml'

<?xml version='1.0' encoding='utf-8'?><eml:eml xmlns:eml="eml://ecoinformatics.org/eml-2.0.1" xmlns:md="eml://ecoinformatics.org/methods-2.0.1" xmlns:proj="eml://ecoinformatics.org/project-2.0.1" xmlns:d="eml://ecoinformatics.org/dataset-2.0.1" xmlns:res="eml://ecoinformatics.org/resource-2.0.1" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xmlns:dc="http://purl.org/dc/terms/" system="Plazi" scope="system" packageId="097DFFCAFFCD060D606EFFE7FF8A2C63/eml-1529643479506.xml"><dataset><alternateIdentifier>097DFFCAFFCD060D606EFFE7FF8A2C63</alternateIdentifier><alternateIdentifier>https://doi.org/10.11646/zootaxa.4366.1.1</alternateIdentifier><alternateIdentifier>1175-5326</alternateIdentifier><alternateIdentifier>1125265</alternateIdentifier><alternateIdentifier>FB8D05F7-52A8-4C95-859B-B4933E718AB6</alternateIdentifier><citation>Kiałka, Agata, Ruta, Rafał (2017): An illustrated catalogue of the New Zealand marsh beetles (Coleoptera: Scirtidae). Zootaxa 4366 (1): 1-76, DOI: https://doi.org/10.11646/zootaxa.4366.1.1</citation><title>An illustrated catalogue of the New Zealand marsh beetles (Coleoptera: Scirtidae)</title><creator><individualName><givenName>Agata</givenName><surName>Kiałka</surName></individualName></creator><creator><individualName><givenName>Rafał</givenName><surName>Ruta</surName></individualName></creator><pubDate>2017</pubDate><language>en</language><abstract><para>This dataset contains the digitized treatments in Plazi based on the original journal article Kiałka, Agata, Ruta, Rafał (2017): An illustrated catalogue of the New Zealand marsh beetles (Coleoptera: Scirtidae). Zootaxa 4366 (1): 1-76, DOI: https://doi.org/10.11646/zootaxa.4366.1.1</para></abstract><intellectualRights><para>Public Domain</para><para>No known copyright restrictions apply. See Agosti, D., Egloff, W., 2009. Taxonomic information exchange and copyright: the Plazi approach. BMC Research Notes 2009, 2:53 for further explanation.</para></intellectualRights><distribution scope="document"><online><url function="information">http://tb.plazi.org/GgServer/summary/097DFFCAFFCD060D606EFFE7FF8A2C63</url></online></distribution><contact><individualName><givenName>Guido</givenName><surName>Sautter</surName></individualName><electronicMailAddress>gsautter@gmail.com</electronicMailAddress><onlineUrl>http://plazi.org</onlineUrl></contact><associatedParty><organizationName>Magnolia Press</organizationName><address><deliveryPoint>St. Lukes 1346</deliveryPoint><city>Auckland</city><country>New Zealand</country></address><electronicMailAddress>magnolia@mapress.com</electronicMailAddress><onlineUrl>http://www.mapress.com/</onlineUrl><role>publisher</role></associatedParty><associatedParty><organizationName>Plazi</organizationName><address><city>Bern</city><country>Switzerland</country></address><electronicMailAddress>info@plazi.org</electronicMailAddress><onlineUrl>http://plazi.org/</onlineUrl><role>distributor</role></associatedParty><metadataProvider><organizationName>Plazi</organizationName><individualName><surName>plazi</surName></individualName></metadataProvider></dataset><additionalMetadata><metadata><gbif><dateStamp>2018-06-22T04:57:59+0000</dateStamp><citation>Kiałka, Agata, Ruta, Rafał (2017): An illustrated catalogue of the New Zealand marsh beetles (Coleoptera: Scirtidae). Zootaxa 4366 (1): 1-76, DOI: https://doi.org/10.11646/zootaxa.4366.1.1</citation></gbif><plaziMods><mods:mods xmlns:mods="http://www.loc.gov/mods/v3">
<mods:titleInfo>
<mods:title>An illustrated catalogue of the New Zealand marsh beetles (Coleoptera: Scirtidae)</mods:title>
</mods:titleInfo>
<mods:name type="personal">
<mods:role>
<mods:roleTerm>Author</mods:roleTerm>
</mods:role>
<mods:namePart>Kiałka, Agata</mods:namePart>
</mods:name>
<mods:name type="personal">
<mods:role>
<mods:roleTerm>Author</mods:roleTerm>
</mods:role>
<mods:namePart>Ruta, Rafał</mods:namePart>
</mods:name>
<mods:typeOfResource>text</mods:typeOfResource>
<mods:relatedItem type="host">
<mods:titleInfo>
<mods:title>Zootaxa</mods:title>
</mods:titleInfo>
<mods:part>
<mods:date>2017</mods:date>
<mods:detail type="pubDate">
<mods:number>2017-12-22</mods:number>
</mods:detail>
<mods:detail type="volume">
<mods:number>4366</mods:number>
</mods:detail>
<mods:detail type="issue">
<mods:number>1</mods:number>
</mods:detail>
<mods:extent unit="page">
<mods:start>1</mods:start>
<mods:end>76</mods:end>
</mods:extent>
</mods:part>
</mods:relatedItem>
<mods:classification>journal article</mods:classification>
<mods:identifier type="DOI">https://doi.org/10.11646/zootaxa.4366.1.1</mods:identifier>
<mods:identifier type="ISSN">1175-5326</mods:identifier>
<mods:identifier type="Zenodo-Dep">1125265</mods:identifier>
<mods:identifier type="ZooBank">FB8D05F7-52A8-4C95-859B-B4933E718AB6</mods:identifier>
</mods:mods></plaziMods></metadata></additionalMetadata></eml:eml>

Appears to be a plazi related publication.

@myrmoteras @gsautter - any idea why a malformed Zenodo DOI would appear in a multimedia.txt file of a Plazi mediated DwC-A datasets registered with GBIF ?

gsautter commented 3 years ago

Just looked into http://tb.plazi.org/GgServer/dwca/097DFFCAFFCD060D606EFFE7FF8A2C63.zip , and there are definitely no slashes in any of the DOIs ... hard to tell how that slash got there.

jhpoelen commented 3 years ago

hey @gsautter thanks for having a look. On close inspection, the issue appears to have been resolved somewhere after Aug 11, 2019 and Oct 20, 2019 , when the archive was last updated. The most recent archive has content id hash://sha256/75f5e4bdea3fc232def8f0b970c97c60551754f8f452e67707a67f18e32d7b90 whereas the older one referred earlier in this issue had content id hash://sha256/1ae87b40107c13e48e68684ed201dfaa2480ac740bd37ed8dced033c75a3e322 (see https://hash-archive.org/history/http://tb.plazi.org/GgServer/dwca/097DFFCAFFCD060D606EFFE7FF8A2C63.zip and attached screenshot) .

I confirmed that the dois are now well formed using (i.e. http://doi.org/10.5281/zenodo.1125301) command/results below. It appears that I was over a year too late in reporting this issue ; ) - it appears that you fixed it already. Sorry about the false alarm and thanks for looking into it.

PS I'd be curious to hear whether you happened to have made changes to your dwc-a in Aug - Oct 2019 time frame.

$ curl -s 'https://preston.guoda.bio/cat/zip:hash://sha256/75f5e4bdea3fc232def8f0b970c97c60551754f8f452e67707a67f18e32d7b90!/multimedia.txt' | grep 1125301
F54487B2FF85064560F9FC25FAA9295E.taxon  http://purl.org/dc/dcmitype/StillImage  image/png   https://zenodo.org/record/1125301/files/figure.png  http://doi.org/10.5281/zenodo.1125301   FIGURE17. Primary type specimens of New Zealand Scirtidae.A) Holotype, Veronatus versicolor Broun, 1921; B) Holotype, Veronatusvestitus Broun, 1921; C) Paratype, Stenocyphon neozealandicus Ruta, Thorpe & Yoshitomi,2011.Scale bar = 1 mm.    FIGURE17. Primary type specimens of New Zealand Scirtidae.A) Holotype, Veronatus versicolor Broun, 1921; B) Holotype, Veronatusvestitus Broun, 1921; C) Paratype, Stenocyphon neozealandicus Ruta, Thorpe & Yoshitomi,2011.Scale bar = 1 mm.    2017-12-22  Kiałka, Agata;Ruta, Rafał       Zenodo  biologists  Kiałka, Agata;Ruta, Rafał           
F54487B2FF85064560F9FA77FA9E2B69.taxon  http://purl.org/dc/dcmitype/StillImage  image/png   https://zenodo.org/record/1125301/files/figure.png  http://doi.org/10.5281/zenodo.1125301   FIGURE17. Primary type specimens of New Zealand Scirtidae.A) Holotype, Veronatus versicolor Broun, 1921; B) Holotype, Veronatusvestitus Broun, 1921; C) Paratype, Stenocyphon neozealandicus Ruta, Thorpe & Yoshitomi,2011.Scale bar = 1 mm.    FIGURE17. Primary type specimens of New Zealand Scirtidae.A) Holotype, Veronatus versicolor Broun, 1921; B) Holotype, Veronatusvestitus Broun, 1921; C) Paratype, Stenocyphon neozealandicus Ruta, Thorpe & Yoshitomi,2011.Scale bar = 1 mm.    2017-12-22  Kiałka, Agata;Ruta, Rafał       Zenodo  biologists  Kiałka, Agata;Ruta, Rafał           
F54487B2FF84064460F9FE83FBD12F29.taxon  http://purl.org/dc/dcmitype/StillImage  image/png   https://zenodo.org/record/1125301/files/figure.png  http://doi.org/10.5281/zenodo.1125301   FIGURE17. Primary type specimens of New Zealand Scirtidae.A) Holotype, Veronatus versicolor Broun, 1921; B) Holotype, Veronatusvestitus Broun, 1921; C) Paratype, Stenocyphon neozealandicus Ruta, Thorpe & Yoshitomi,2011.Scale bar = 1 mm.    FIGURE17. Primary type specimens of New Zealand Scirtidae.A) Holotype, Veronatus versicolor Broun, 1921; B) Holotype, Veronatusvestitus Broun, 1921; C) Paratype, Stenocyphon neozealandicus Ruta, Thorpe & Yoshitomi,2011.Scale bar = 1 mm.    2017-12-22  Kiałka, Agata;Ruta, Rafał       Zenodo  biologists  Kiałka, Agata;Ruta, Rafał           

Screenshot from 2020-11-06 15-39-39 Screenshot from 2020-11-06 15-39-47

gsautter commented 3 years ago

Was updated Sep 26/27 2019, yes ... usually GBIF gets a notification on updates and re-ingests the DwCA within an hour.

gsautter commented 3 years ago

PS: the change and last ingestion date is listed on the dataset page on GBIF ... the two usually lies within an hour or so (see above).