Dataset from geco group (Eva) - Presence-absence of plant habitat specialists in 15 patches

rukayaj commented 2 years ago

Eva is making a data paper 🥳 She wants to get started on the metadata, so I have made her an account on the IPT and she has started filling it in: https://ipt.gbif.no/resource/preview?r=geco-plant-habitat-specialists-15-patches

Thinking ahead towards publishing the data - one thing I noticed is that it is in cross tab format, we will need to upload it in a normal list format, and we need to have the sampling event in a separate sheet. So there needs to be one row for each occurrence, like this:

Occurrence file:	occurrenceID	eventID	scientificName
1	p1-2012	Acinos arvensis	present
2	p1-2012	Androsace septentrionalis	present
...	...	...	...
n - 1	p15-2020	Veronica spicata	absent
n	p15-2020	Woodsia alpina	absent

We might also add individualCount as 0 for the absence records. Actually we usually just publish the presence occurrences and in the metadata we put a list of species so that the absences can be inferred. But I've been wondering lately if that's really the best call now more people have started publishing absence data on GBIF. I think I saw something about on one of the GBIF github issues. Thoughts @dagendresen @vidarbakken ?

Anyway, each of these occurrences would need to be related to an event via the eventID. So we would have a separate event file, looking something like this, with each collection (at a certain patch, in a certain year) as a separate event:

Event file:	eventID	year	decimalLatitude	decimalLongitude
p1-2012	2012	1.111	2.222	100
...	...	...	...	...
p15-2020	2020	1.112	2.223	100

@evalieungh: I can do this data conversion for you, but maybe you have it in list format already for the data analysis? We usually use uuids for the ID columns but for this example I've kept it simple so it's easier to see how they relate to each other.

Do you know what day + month the observations were recorded as well? And who was doing the fieldwork each year?

evalieungh commented 1 year ago

About 2. species names, I think it will be impossible to match the names exactly while following Artsnavnebasen. I propose to do this instead:

check each species to find the most updated name in GBIF that corresponds best to Artsnavnebasen.
update the IPT data with these names
update the manuscript table to match the IPT
upload the (old) table with names following the national database to GitHub, with common name as a link back to the other data files.

It is weird that GBIF and Artsnavnebasen names don't always match. For instance, Cotoneaster niger is a problem. In artsnavnebasen it is Cotoneaster niger (Wahlb.) Fr., which doesn't exist in GBIF as far as I see. GBIF does have Cotoneaster nigra (Wahlb.) Fr., but I'm not sure it's the same and it has very few occurrences (only old herbarium records). On GBIF the best match seems to be Cotoneaster niger (Fr.) Fr., but that may be based on a different type specimen?

evalieungh commented 1 year ago

Here are the best matching GBIF names: https://github.com/evalieungh/gressholmen_data/blob/main/taxa_GBIF.csv

rukayaj commented 1 year ago

I will take a look tomorrow :)

evalieungh commented 1 year ago

https://github.com/evalieungh/gressholmen_data/blob/main/taxa_GBIF.csv now contains old names (OldName column) and new, correct ones. Renamed common name column to VernacularName.

@rukayaj can you fix the names and add absence records like we talked about? I don't understand the IPT editing. The file above has 1 row= 1 species with the old name, new (correct) name, and vernacular. We talked about adding the Vernacular name as a new column and adding absence records under 'occurrenceStatus'. You also said we should add individualCount=0. One problem might be that all the presences won't have any information in that column, unless we 'cheat' and set them all to 1. I didn't count individuals so maybe it's possible to set it to NA for the species that are present?

rukayaj commented 1 year ago

Sure, I'm gonna do it now.

See https://ipt.gbif.org/manual/en/ipt/2.5/sampling-event-data#q-how-do-i-publish-absence-data

Here it says we can use organismQuantity, and I see one of the examples it gives in https://dwc.tdwg.org/terms/#dwc:organismQuantity has "many" as an available option.

So for the presence records, I will put: organismQuantity="at least one" (because "many" is not quite accurate, right?) organismQuantityType="individuals" occcurrenceStatus=present

For the absence records: organismQuantity="0" organismQuantityType="individuals" occcurrenceStatus=present <- EDIT - I meant absent!!

rukayaj commented 1 year ago

I just updated it, have a look tomorrow and see what you think.

evalieungh commented 1 year ago

So for the presence records, I will put: organismQuantity="at least one" (because "many" is not quite accurate, right?) organismQuantityType="individuals" occcurrenceStatus=present

For the absence records: organismQuantity="0" organismQuantityType="individuals" occcurrenceStatus=present

Thanks! You are correct, "at least one" is better than "many" for this data set. But shouldn't the occurrenceStatus for the absences be "absent"?

evalieungh commented 1 year ago

Oh, I see there are "absent" records in the data so it should be fine 👍

rukayaj commented 1 year ago

Whoops copy and paste error, yes, they should be absent. And I set it to absent in the data.

evalieungh commented 1 year ago

Now I've resubmitted!

rukayaj commented 1 year ago

That's great! Hopefully this is it now then...

rukayaj commented 1 year ago

This paper was now published - doi.org/10.3897/BDJ.10.e94057! Congratulations, @evalieungh :)

gbif-norway / helpdesk

Dataset from geco group (Eva) - Presence-absence of plant habitat specialists in 15 patches #87