gbif / portal-feedback

User feedback for the GBIF API, website and published data. You can ask questions here. 🗨❓
30 stars 16 forks source link

Abundance data not included in dataset (Swedish Bird Survey: Fixed routes (Standardrutterna)) #4020

Open gbif-portal opened 2 years ago

gbif-portal commented 2 years ago

Abundance data not included in dataset (Swedish Bird Survey: Fixed routes (Standardrutterna))

Hello,

I am writing with a question about the dataset: Swedish Bird Survey: Fixed routes (Standardrutterna) (https://www.gbif.org/dataset/91fa1a0d-a208-40aa-8a6e-f2c0beb9b253).

The Darwin Core Archive version of this dataset includes counts of individuals per species in a column called "organismQuantity" but contains all NA's for the column "individualCount". The Simple_CSV download includes the "individualCount" column (with all NAs) but it does not include the "organismQuantity" column (with the abundance data). So, essentially, the Simple _CSV version of the data is missing a major component of information.

We have reached out to the data suppliers and they mentioned that they recently changed the placement of this information from the "individualCount" to the "organismQuantity" column. Perhaps this led to some problem with the production of the Simple_CSV data?

I don't know what is the solution but it is really important that the Simple_CSV version of the data also includes the individual count data. How do you think this can be resolved?

Please let me know if I can clarify anything! And thanks for all your work, as well as your attention to help resolve this!

Bob Muscarella robert.muscarella@ebc.uu.se


Bob Muscarella Associate professor Department of Ecology and Genetics Uppsala University


User: See in registry System: Chrome 100.0.4896 / Mac OS X 10.15.7 Referer: https://www.gbif.org/dataset/91fa1a0d-a208-40aa-8a6e-f2c0beb9b253 Window size: width 1711 - height 971 API log&_a=(columns:!(_source),filters:!(),index:'3390a910-fcda-11ea-a9ab-4375f2a9d11c',interval:auto,query:(language:kuery,query:''),sort:!())) Site log&_a=(columns:!(_source),filters:!(),index:'5c73f360-fce3-11ea-a9ab-4375f2a9d11c',interval:auto,query:(language:kuery,query:''),sort:!())) System health at time of feedback: OPERATIONAL

MattBlissett commented 2 years ago

CC @CecSve via https://github.com/gbif/pipelines/issues/666#issuecomment-1097860441

Thanks Bob,

Relevant DWC terms: https://dwc.tdwg.org/terms/#dwc:individualCount and the following two.

We calculate the fields individualCount and occurrenceStatus from each other (e.g. if individualCount is zero, we fill in occurrenceStatus=ABSENT if it's blank). The combinations are given here: https://github.com/gbif/pipelines/issues/268#issuecomment-624755278

I think, in the case that organismQuantityType=individual, we should include the organismQuantity field in this process, equivalent to individualCount. There will be additional error cases where the value doesn't match the individualCount etc.

anneliejonsson commented 2 years ago

Hello, I'm part of the group that is responsible for publishing this dataset, and Bob has been in contact with me about this issue. As a consequence I started comparing the current SIMPLE-download to our latest upload (the Source Archive). And I realised that there is no field in the current SIMPLE-download that gives information about the location, apart from the coordinates (decimalLatitude/Longitude). The locality-field is also present but empty as we never enter anything there for this dataset.

Bob and colleague did a SIMPLE-download of this dataset in 2020 and I wanted to compare that to the current version. Unfortunately they didn't save the original downloaded file, hence we are unsure whether that contained any other fields than the current version. However, in their manipulated version of the downloaded file (reduced in terms of fields and records for educational purposes) there is a locationID-field. Bob and colleague are not entirely sure whether that was there from the start or whether they created it, but the contents of this field do match the contents of our then uploaded SourceArchive. Since 2020 we have, in addition to changing from individualCount to organismQuantity etc, also changed the format of the actual locationIDs, but they are still entered in the locationID-field.

If the locationID field really was present in the 2020-version of the SIMPLE-download, what could be the reason for it not being present in the current version? And could it be put back?

Many thanks in advance! /Annelie

MattBlissett commented 2 years ago

Hi Annelie,

Bob or the colleague might have the original download -- they can see all their downloads at https://www.gbif.org/user/download

I'm almost certain no fields have been removed from the SIMPLE download format in the last few years. https://doi.org/10.15468/dl.468zh3 is a download of this dataset from May 2020, and the locality field is blank.

The DWCA format contains all fields, and contains three tab-separated files. If you are choosing columns from the SIMPLE format, you could also choose the columns from occurrence.txt in a DWCA format download and ignore the rest.

CecSve commented 2 years ago

I think, in the case that organismQuantityType=individual, we should include the organismQuantity field in this process, equivalent to individualCount. There will be additional error cases where the value doesn't match the individualCount etc.

@MattBlissett could this solution be included in the SIMPLE download then?

bobmuscarella commented 2 years ago

Hi all - I'm just seeing this conversation - thanks to everyone for trying to figure it out / find a solution!

Indeed, as @MattBlissett suggested, it seems my previous download is still saved (DOI10.15468/dl.yvbvb4).

In that dataset, the column "IndividualCount" has the abundance data. There is no field for "LocationID" but the "locality" fields has all NAs. So I guess I created my own locationID column based on unique coordinates.

Hope that helps! Bob

anneliejonsson commented 2 years ago

Thanks a lot everybody!

/Annelie

CecSve commented 1 year ago

Just to be sure @bobmuscarella nd @anneliejonsson - did you find a workable solution to this issue? I can see the simple download still does not contain any counts in the individualCount field.

anneliejonsson commented 1 year ago

Bob and his colleagues managed to find the data they needed back then so all's good in that respect.

With regards to which fields are included in the SIMPLE download, I interpreted the conversation here such that you at GBIF would make changes to ensure the numbers from organismQuanity would be "copied across" to individualCount, at least when organismQuantityType = individuals. But that doesn't seem to have happened (yet) then.

However, for simplicity I have started to include both fields (org.Quant + ind.Count) in our DwC-archives, but we just haven't updated all of them yet. It will happen at some point in March this year I hope.

CecSve commented 1 year ago

Good to hear Bob and colleagues sorted out the data.

@MattBlissett if we are to copy the numbers from organismQuantity to individual count - which pipeline does it relate to?

MattBlissett commented 1 year ago

An issue in https://github.com/gbif/pipelines would be best, so it isn't lost in this longer discussion.

Thanks.