Closed rukayaj closed 3 years ago
occurrenceID Forslår å legge inn prefix urn:uuid:[UUID] foran UUIDen. Det er også mulig å benytte (stole på) GBIF-nodens resolver, https://purl.org/gbifnorway/id/[UUID]. En naken UUID er også en bra persistent identifikator, men den blir enda bedre maskinlesbar om den prefikses som en URN eller en PURL. Hvilken du velger er en smakssak, men PURL fungerer best akkurat nå, mens URN kanskje (!!) fungerer best i en fremtid med en universell modell for å resolvere URN... c31fc61e-ef26-11e9-9c88-891c2040a2f0 --> urn:uuid:c31fc61e-ef26-11e9-9c88-891c2040a2f0 https://purl.org/gbifnorway/id/c31fc61e-ef26-11e9-9c88-891c2040a2f0
recordedBy + recordedByID Anbefaler å legge inn maskinlesbar ID for innsamler. Slik blir det enklere for maskiner å gi deg siteringspoeng i den nye modellen for evaluering av forskere (som kommer hurtig). (Her hvor alle artsdata er innsamlet av deg har jeg lagt inn en hardkodet referanse, men i datasett med flere innsamlere er det lurt å legge inn per datapost). recordedBy = Jørn Olav Løkken --> recordedByID = https://orcid.org/0000-0003-1024-0406
identifiedBy --> identifiedByID Her hvor det er flere personer som har identifisert artsdata-postene, bør identifiedByID gjerne legges inn per datapost. Ved flere personer kan flere ORCID oppgis pipe separert. Anders Often = ? Kåre Arnstein Lye = https://orcid.org/0000-0003-0398-890X (?)
verbatimLocality --> locality + locationID + country Jeg ser du benytter verbatimLocality som vanligvis benyttes for locality-beskrivelse akkurat slik som den står på en etikett for et samlingsobjekt eller i felt-dagboken. Når du legger inn det du mener er riktig lokalitet tror jeg at jeg anbefaler å bare legge dette inn i feltet "locality" direkte? For at lokalitet skal blir maskinlesbar er det også veldig nyttig å gi en maskinlesbar referanse, f.eks. fra geonames. Ihvertfall om ingen maskinlesbar lokalitetsbeskrivelse inkluderes er det lurt å oppgi ihvertfall land (gjerne eventuelt kommune og fylke) fordi mange ulike steder har samme navn :-) Jeg vet dette kan avledes fra georeferansene, men litt mere data gir mat for datakvalitetsrutiner som gir høyere tillit til georeferansene (som det dessverre ofte er litt problemer med).
Jeg fant ikke alle lokale steder i Geonames, og har her lagt inn tettsted/by isteden. Det er mulig å legge til lokale steder i Geonames selv. Jeg har lagt til noen av stedene - og den maskinlesbare referansen virker ikke riktig ennå (kanskje om noen timer?).
Ås VGs. = https://sws.geonames.org/3162672/ Bergen rådhus = https://sws.geonames.org/12216987/ Gardermoen alle = https://sws.geonames.org/3150851/ (Jessheim) Gardermoen næringspark = https://sws.geonames.org/3150851/ (Jessheim) Hoveodden = https://sws.geonames.org/3151635/ Lysaker Møllefossen = https://sws.geonames.org/12216988/ Politihuset Trondheim = https://sws.geonames.org/12216989/ Verdal VGS = https://sws.geonames.org/12216986/ Vogellund = https://sws.geonames.org/8299525/ (Nesbru)
1) the habitat strings are in Norwegian - if it's easy for him to change it to English that would be great, but if there are many variations then I think it's more important to just publish the data 2) it looks like it might be better as an event dataset - but maybe you and Dag already spoke to him about this in the meeting last week? 3) It might be nice if there was a bit more info in the metadata too about the data collection methods, if he knows how collection normally happens. For eksempel en fyldigere beskrivelse og metoder. Beskriv gjerne bakgrunnen for prosjektet. Da vil det være enklere for andre å forstå bakgrunnen til at dataene er innsamlet. Det kan også nevnes i teksten hvor i Norge dataene er fra. Du kan også angi det geografiske området på kartet i metadataene. Jeg kan eventuelt også gjøre dette så snart jeg ser hvor dataene er innsamlet. Jo mer metadata som kan legges inn, jo bedre er det for forståelsen og kvaliteten av dette datasettet.
Skulle dette vært registrert som et eget datasett, eller hører dette sammen med det fra 2019?
Great, I think we can close this as it's now published: https://doi.org/10.15468/cuocad
I suggest adding locationID? Will have a look.
I added translation for the verbatimLocality to locationID -- BUT now it looks like IPT also translate the values for the locality/verbatiumLocality mapping...???!!
And publishing fails...? Error on the uniqueness of occurrenceIDs
Publishing version #1.4 of resource occurence_small_projects failed: Archive generation for resource occurence_small_projects failed: Can't validate DwC-A for resource occurence_small_projects. Each line must have a occurrenceID, and each occurrenceID must be unique (please note comparisons are case insensitive)
Did we archive a copy of the raw data in Zenodo?
I didn't, but maybe Vidar did?
I dont have the raw file, can we "extract" it from the IPT?
[Uploading od_small_projects_241019.txt…]()
We might map the folder of the resources to an URL ...? And then use this URL to fetch raw files...? E.g. https://ipt.gbif.no/resources/occurence_small_projects/sources/od_small_projects_241019.txt
Looks like vartax_2021 is the NEW file that Jørn added, and od_small_projects_241019 is an old datafile. And that the vartax_2021 file is mapped identically twice! Thus duplicate occurrenceIDs....
vartax_2021.txt I am hoping the next version of the IPT will include one of the patches which allows the download of files from the admin interface...
Hmm I wonder how it got published successfully then, if both files were mapped.
It looks as if the data records I get when fetching the DwC-A do NOT include recordedByID nor identifiedByID nor locationID when these are only mapped in ONE of the two data files joined vertically...?
I hope will work decently if f I add the recordedByID identifiedByID locationID to the data-records fetched from the DwC-A... :-)
It looks like this DWCA https://ipt.gbif.no/archive.do?r=occurence_small_projects&v=1.5 does contain recordedByID. From what I can see, identifiedByID isn't a mapped column, and I don't see the locationID translation ?
Maybe just me thinking there would be more there -- and silly me, I believe it was actually me deleting the locationID mapping just before generating the DwC-A :-) a bit too much parallell tasks.
I added the "new" source data file at https://zenodo.org/record/4564455#.YDjwU11Kjzc And added the URI to the dataset EMAL metadata in the IPT
I added in the "new" source data file an alternative "occurrenceID_urn" column with "urn:uuid:" prefix. However, eventually updating the occurrenceIDs would likely BREAK the occurrenceKey continuation...? So I did maintained the mapping of the old naked UUIDs as occurrenceID.
Should we keep these or remap to the urn.uuid perfixed ones?
I located the geonames locationID for the localities in the previous (2019) od_small_projects_241019 [file], but did not start on doing the same for the localities in the new (2021) vartax_2021 [file] localities...
It was published so recently I wonder if it would matter so much, to add the "urn:uuid" prefix?
I guess the "urn:uuid:" prefix in practice makes no difference today... but in a bright future sometime maybe the world cares enough to be able to resolve urn:uuid: preefixed URNs ... :-) In many ways I personally like them much more than the HTTP and HTTPS things. But I am fine with closing this issue now, unless you want to add some other things.
No I can't think of anything else, let's close it for now.
Republished with new data files from Jørn
Updated the datafiles in Zenodo, new link https://zenodo.org/record/4564455#.YECVF11Kjzc
Changes made by Jørn
Fant også at det var en feil i latinsk navn på en art i VarTax-filen som jeg har rettet: Latinsk navn på Mjødurt Filipendula ulmaria hadde blitt til Filipendula vulgaris -> knollmjødurt, noe som er en litt uheldig feil siden sistnevnte er rødlistet.. Legger også ved original-filene så kan dere legge dem opp på Zendo.
Reopening because Jørn sent another follow up, and it looks like there are still records with Filipendula vulgaris which I think might be wrong?
Jørn asked me to publish this again and he says everything is now ok, apparently the remaining records with Filipendula vulgaris were correct.
https://ipt.gbif.no/manage/resource.do?r=occurence_small_projects