Open DimEvil opened 2 years ago
First task looks good, as it looks like you are assigning each occurrence a GBIF ID
--> I'm assigning each occurrence a new unique ID, this is not the GBIF ID. In the original dataset I could not find GUID (Global Unique ID's, which we need for publication on GBIF. What came close was the institutionID (I guess this is the unique record number in the original database). DwC:institutionID is actually an ID for the institute, so I guess the term was not used correctly.
If I download the data from https://doi.org/10.15468/dl.my64ap, I get the whole DwC (also all the empty columns) and the uniqueID can be found back in occurrenceID
Scientific Name should be Desmodus rotundus
Ok, fixed
ok
If the Lee paper data here (Desmodus_dataset_Dec_2021.csv) is not already on GBIF , we will keep them in the dataset. :)
More info on the term can be found here: https://github.com/tdwg/dwc/issues/329 IT's referring to evidence of an occurrence from literature.
ok
I think there is an error somewhere. Where family and taxonRank changed places.
Also here is something not correct, as we have 100% Desmodus rotundus, family should be the same in the whole dataset.
fixed
I also changed M & F in Dwc:sex to male and female as these are the controlled vocabulary terms. :)
Sounds good. Looks like two of the columns might have been switched at some point (family and taxonRank). Perhaps try downloading the most recent version from Figshare (https://figshare.com/articles/dataset/Desmodus_rotundus_Occurrence_Record_Database/15025296). (i.e., Desmodus_dataset_Apr_2022).
Some more questions, remarks in April 22 version
I have these values in basisOfRecord MachineHUMAN_OBSERVATIONbPRESERVED_SPECIMENervatiHUMAN_OBSERVATIONn HumanHUMAN_OBSERVATIONbPRESERVED_SPECIMENervatiHUMAN_OBSERVATIONn
Still some errors in family <-> taxonRank (I created fixed terms here now)
All taxonrank is now species (as all scientificName = Desmodus rotundis (no subspecies)
Okay, gotcha. Just went through and fixed those columns, so they should be okay now. For taxonRank species is fine, just make sure that Desmodus rotundus is spelled correctly and that should be good to go!
For the basis of record it looks like some things have been collated. Should be HUMAN_OBSERVATION, PRESERVED_SPECIMEN, or LIVE_SPECIMEN. That specific record is unknown, so it should be left black or listed as UNKNOWN. Does that make sense?
occurrence
%<>% mutate(dwc_basisOfRecord = case_when ( datasetName == 'Lee' ~ "MaterialCitation" ,datasetName == 'Zarza' ~ "MaterialCitation" ,basisOfRecord == 'UNKNOWN' ~ "occurrence" ,datasetName == 'literature' ~ "MaterialCitation" ,datasetName == 'Piaggio' ~ "MaterialCitation" ,datasetName == 'Juan Luis_Personal_Database' ~ "MaterialCitation" ,datasetName == 'Literature' ~ "MaterialCitation" ,datasetName == 'Bio_Diversi_Data_UY' ~ "MaterialCitation" ,datasetName == 'PCMS' ~ "MaterialCitation" ,datasetName == 'Literature' ~ "MaterialCitation" ,datasetName == 'Streicker_Peru' ~ "MaterialCitation",datasetName == 'SAG' ~ "MaterialCitation" ,TRUE ~ basisOfRecord
))