gbif / portal-feedback

User feedback for the GBIF API, website and published data. You can ask questions here. 🗨❓
30 stars 16 forks source link

Occurrence search - datasetID/datasetName #3026

Closed gbif-portal closed 3 years ago

gbif-portal commented 4 years ago

Occurrence search - datasetID/datasetName

Hi

Is it possible to search occurrences by datasetID or datasetName. Like the variables found in this record: https://www.gbif.org/occurrence/2402407820. If not I would very much be in favor of developing/implementing that possibility.

Thomas Sæther


User: See in registry System: Chrome 85.0.4183 / Windows 10.0.0 Referer: https://www.gbif.org/occurrence/2402407820 Window size: width 2048 - height 1010 API log&_a=(columns:!(_source),index:'prod-varnish-',interval:auto,query:(query_string:(analyze_wildcard:!t,query:'response:%3E499')),sort:!('@timestamp',desc))) Site log&_a=(columns:!(_source),index:'prod-portal-',interval:auto,query:(query_string:(analyze_wildcard:!t,query:'response:%3E499')),sort:!('@timestamp',desc))) System health at time of feedback: CRITICAL datasetKey: b124e1e0-4755-430f-9eab-894f25a9b59c publishingOrgKey: d3978a37-635a-4ae3-bb85-7b4d41bc0b88

dagendresen commented 3 years ago

A thrilling thought indeed would certainly be if data records from across "DwC-A datasets" could be grouped and displayed as participants in "datasets" identified by datasetID (and maybe datasetName) -- and likewise for collectionID -- and some kind of "projectID" thing. (However, the datasetIDs would of course preferably use globally unique persistent identifiers and not plain integers/literals as in the example).

Maybe even completely replacing the current "DwC-A" = "the dataset" concept. In other words split the "DwC-A" as the data-package-container from the "dataset" concept. And enable the authoring of the (EML) dataset metadata completely outside externally to the IPT - and simply referenced from the data-package. However, I realise this would need deep thinking to not break current implementations.

MortenHofft commented 3 years ago

Hi Thomas Just to confirm as I see your question was never answered. No you cannot search for datasetID or datasetName.

It is always nice to understand the motivation behind feature requests, could you describe the use case a bit more please?

Would it make sense to split your dataset into multiple if the datasetName/datasetId changes within the "GBIF dataset". That would create a 1:1 relation so you could use the regular dataset filter

user notified by email

MortenHofft commented 3 years ago

@dagendresen it sounds like what you are asking is something else and might deserve a separate issue? Also note that there is already ways to group occurrences across datasets.

search dataset by projectId search occurrences by projectID search occurrences by matched grscicoll collection search occurrences by collection code search occurrences by publisher

ThomasSather commented 3 years ago

@MortenHofft this was a novice attempt to suggest a workaround to what might be a local (Norwegian) problem. As you probably know, we have large and well-functioning national species data infrastructure (Artsdatabanken/Artsobservasjoner) in Norway, run by the Norwegian Biodiversity Information Centre (NBIC; www.biodiversity.no). GBIF harvest data from this database every night. Nordre Øyeren Bird Observatory (https://www.gbif.org/publisher/74331676-6384-4da7-8e92-11001735db6a) has chosen to use NBIC’s digital infrastructure to publish data from our systematic, weekly waterfowl and wader counts dating back more than 45 years. This is easy for us, but also poses a couple of problems: NBIC is automatically listed as the owner/publisher of all our data. We are looking for a way to stratify the NBIC data so that we can generate DOIs for our own data. However, based on discussions with @dagendresen we understand that this is better solved locally, by NBIC splitting up/re-annotating their data.

MortenHofft commented 3 years ago

If the NBIC dataset is made up of many smaller datasets and it is useful for the data owners of those dataset to delimit their own data, then splitting it sounds natural to me.

That also gives you the added benefit of citation tracking for your individual dataset, which might be of interest. And an option to describe your specific data in more details than the NBIC dataset can.

Thank you for the explanation

timrobertson100 commented 2 years ago

Please see https://github.com/gbif/pipelines/issues/662 where we intend to implement multivalue dataset ID and name search capabilities shortly.

ThomasSather commented 2 years ago

This is great news! Looking forward to testing it out!