Closed gbif-portal closed 6 months ago
Could it be a simple misspelling of the extension perhaps?
DwC-A data file »measurementOrFact.csv« does not exist at
It should be measurementOrFacts.csv
Hm...wouldn't it need to be "measurementsOrFacts" if that was the case?
Looks like Darwin Core kind of has it both ways 😅
Looks like Darwin Core kind of has it both ways 😅
Jeez. Well, it looks like you had it right to begin with: https://github.com/gbif/rs.gbif.org/blob/master/extension/measurements_or_facts_2024-02-19.xml and https://dwc.tdwg.org/terms/#measurementorfact.
Ok, back to the drawing board then. I will investigate.
I downloaded the most recent endpoint and I cannot see that the file exist in the archive, however, you have added the extendedMeasurementOrFact to your meta.xml file. We use the meta.xml file to validate the content of the archive (see the message: Exception caught during metasyncing DwC-A [b0515413-6d32-490a-83a0-f8c08f002c70]
, and service crawler-dwca-metasync)
which then throws the error in stack_trace org.gbif.dwc.UnsupportedArchiveException: DwC-A data file »measurementOrFact.csv« does not exist
.
If the meta.xml of the different archives keeps referring to such a file, whether there will be content in it or not, the file should be included in the archive. Does it make sense?
Hm, so we have "extendedMeasurementOrFact" in the meta, but only "measurementOrFact" in the actual archive?
Hm, so we have "extendedMeasurementOrFact" in the meta, but only "measurementOrFact" in the actual archive?
I do not see either extension file in the archive, only the extendedMeasurementOrFact information in meta.
Hm, so we have "extendedMeasurementOrFact" in the meta, but only "measurementOrFact" in the actual archive?
I do not see either extension file in the archive, only the extendedMeasurementOrFact information in meta.
Aha, thanks for the clarification! It does appear that our batch process is not including the measurementOrFact file, but our single-publishing process is. I'll look into this! Feel free to close this issue.
Can't handle DwC-As with measurementOrFact files
We've noticed a problem that when we attempt to publish a Darwin Core Archive that contains a measurementOrFact extension, the dataset is rejected by your system (e.g., https://logs.gbif.org/app/discover#/?_g=(filters:!(),refreshInterval:(display:On,pause:!f,value:0),time:(from:now-1y,to:now))&_a=(columns:!(_source),filters:!(('$state':(store:appState),meta:(alias:!n,disabled:!f,index:'439da4d0-290a-11ed-8155-a37cb1ead50e',key:level,negate:!f,params:(query:ERROR),type:phrase),query:(match_phrase:(level:ERROR)))),index:'439da4d0-290a-11ed-8155-a37cb1ead50e',interval:auto,query:(language:lucene,query:'datasetKey.keyword:%22b0515413-6d32-490a-83a0-f8c08f002c70%22%20AND%20attempt:%22167%22'),sort:!('@timestamp',desc))).
Could the system just ignore this file instead of rejecting the whole archive? Several of our portals now have this extension.
Github user: @themerekat User: See in registry - Send email System: Chrome 124.0.0 / Windows 10.0.0 Referer: https://www.gbif.org/dataset/b0515413-6d32-490a-83a0-f8c08f002c70 Window size: width 1536 - height 703 API log&_a=(columns:!(_source),filters:!(),index:'3390a910-fcda-11ea-a9ab-4375f2a9d11c',interval:auto,query:(language:kuery,query:''),sort:!())) Site log&_a=(columns:!(_source),filters:!(),index:'5c73f360-fce3-11ea-a9ab-4375f2a9d11c',interval:auto,query:(language:kuery,query:''),sort:!())) System health at time of feedback: INFO