gbif / occurrence

Occurrence store, download, search
Apache License 2.0
22 stars 15 forks source link

Missing datasetName in download #270

Closed niconoe closed 10 months ago

niconoe commented 3 years ago

Hello,

I've recently encountered cases where a DwC-A download from the GBIF portal has an empty datasetName field.

Example: in the following download record with gbifId=297835694 has an empty datasetName in occurrences.txt, while the occurrence page gives a proper dataset name (Pl@ntNet automatically identified occurrences).

In that case, it looks like it might be cause by the non-ascii "@" character in the datasetName, but I've encountered it with seemingly less exotic dataset names (for example for this occurrence)

niconoe commented 2 years ago

Hello GBIF team, do you already know if this this an issue you plan to tackle in the next few weeks/months?

Otherwise I'll work on my (data consumer) side to avoid the issue, for example by making API calls to retrieve the missing dataset names based on the datasetKey.

Thanks a lot!

MattBlissett commented 2 years ago

@fmendezh, there aren't any empty datasetName values in Hive, could this be an ES issue? It's a small download.

ManonGros commented 2 years ago

Could this issue be related: https://github.com/gbif/portal-feedback/issues/3814?

marcos-lg commented 2 years ago

Fixed in PROD.

niconoe commented 1 year ago

Unfortunately, the issue doesn't seem fully solved (I recently removed my workaround in GBIF Alert and the issue came back.

This download for example has 27141 blank dataset names.

marcos-lg commented 10 months ago

I took a look at this issue again and I realized that the datasetName field is the darwin core field (https://dwc.tdwg.org/terms/#dwc:datasetName), not the title of the dataset in our registry, which is the one that the occurrence page displays.