gbif-norway / helpdesk

Please submit your helpdesk request here (or send an email to helpdesk@gbif.no). We will also use this repo for documentation of node helpdesk cases.
GNU General Public License v3.0
3 stars 0 forks source link

filter on dataset name does not work #190

Open andersfi opened 4 days ago

andersfi commented 4 days ago

We use the term dataset name to register project data from NBICs "artsprosjekt" so that NBIC can verify that the data have been published according to the agreement. It is hence important that this workflow actually works. We have at NTNU-VM run into a bug here.

The filtering on dataset name in the GBIF portal seems not to be working. For example, the occurrence https://www.gbif.org/occurrence/4948367391 have the dataset name " Artsdatabanken Artsprosjekt_7-20_Rotifers - small coastal pounds from Agder, Trøms and Finnmark". Filtering for this on https://www.gbif.org/occurrence/search?publishing_org=a8144f37-5ff7-4137-9400-94b5b2ea4ec4&advanced=1&dataset_name=Artsdatabanken%20Artsprosjekt_7-20_Rotifers%20-%20small%20 does yield any hits on the portal.

This error occurred after renaming and republishing the dataset.

dagendresen commented 4 days ago

When I search I get 528 hits

https://www.gbif.org/occurrence/search?publishing_org=a8144f37-5ff7-4137-9400-94b5b2ea4ec4&advanced=1&dataset_name=Artsdatabanken%20Artsprosjekt_7-20_Rotifers%20-%20small%20coastal%20pounds%20from%20Agder,%20Tr%C3%B8ms%20and%20Finnmark

Screenshot 2024-10-14 at 09 14 40

aaltenburger2 commented 4 days ago

Related to the original question, I think the Provenance section, and therein the term project, would be a more natural place to include this information. image I can't find either the section or the term at https://dwc.tdwg.org/terms/. Do we have a field in MusIT that gets exported to the project field?

dagendresen commented 4 days ago

I believe that these terms are from EML (Ecological Metadata Language) -- mixed with some terms created by GBIF on dataset level only. And thus not available at record level. There has been a well documented request for projectID, projectName, project funder, etc for Darwin Core (and record-level documentation of project data) -- but somebody needs to make the effort to make the term request and maybe create a TDWG Task Group to develop such terms.

aaltenburger2 commented 4 days ago

I am happy to contribute requesting those terms. Can you share what has been documented already?

dagendresen commented 4 days ago

We can try to collect some relevant references and GitHub issues together? I mean to recall that at least Sharon Grant (Field Museum) and Ming (AntaBIF) have been posting requests and suggestions. We can search for these on the GBIF and TDWG GitHub. I suggest searching GitHub repositories under GBIF and TDWG for search keywords "datasetID" and "datasetName".

aaltenburger2 commented 3 days ago

Thank you Dag! I see, it has been discussed previously, and a workaround was implemented by allowing the projectID metadata field to accept multiple values (https://github.com/gbif/pipelines/issues/836). However, this solution still does not support projectIDs on individual records. Instead of creating a new issue, should we reopen the above GitHub issue #836 and request that projectID be added to DwC at the record level? I anticipate this might create issues with the projectID field in the GBIF metadata. What I want/need/suggest are "project name," "projectID," "funder name," and "funder ID" as DwC terms at the record level.

dagendresen commented 3 days ago

Hi, I think that the appropriate chain of actions would be to (1) introduce project terms to Darwin Core (or another data standard) and then (2) introduce the TDWG terms to the GBIF application profiles we use with the IPT etc. Sort of top-dowm, letting the data standards rule applications.

Jumping in at the GBIF issue 836 would mean minting temporary (?) project terms in the GBIF namespace that have not passed TDWG standardization. Sort of bottom-up letting practice rule the data standards... (if somebody makes the effort to integrate such practice into the data standards...)

Both paths are of course possible :-)

Then there is of course also the possibility to FIND the project terms in other non-TDWG data standards - and to promote using such standardized terms in the GBIF application profiles... Developing a new TDWG standard or addition to DwC would of course also involve exploring how other data standards describe projects!!!

aaltenburger2 commented 3 days ago

I agree with your top down approach. I couldn't find a discussion about it on the TDWG github. Should I start one?

dagendresen commented 3 days ago

I suspect that there might be plenty already on various TDWG repositories... :-) There are always the DwC-QnA (FAQ) threads to explore more and to post follow.up questions to...? https://github.com/tdwg/dwc-qa/issues?q=is%3Aissue+is%3Aopen+project https://github.com/tdwg/dwc-qa/issues/37 https://github.com/tdwg/dwc-qa/issues/83 https://github.com/tdwg/dwc-qa/issues/100 https://github.com/tdwg/dwc-qa/issues/199