gbif / portal-feedback

User feedback for the GBIF API, website and published data. You can ask questions here. 🗨❓
30 stars 16 forks source link

Data validator hits an error while the test IPT gives useful user feedback #4426

Open gbif-portal opened 1 year ago

gbif-portal commented 1 year ago

Data validator hits an error while the test IPT gives useful user feedback

When validating an archive the validator immediately fails while running the archive through the test IPT:

Publishing version #1.6 of resource test_cab2022 failed: Archive generation for resource test_cab2022 failed: Can't validate DwC-A for resource test_cab2022. Each line must have a taxonID, and each taxonID must be unique (please note comparisons are case insensitive)

14:12:33 Validating the core file: taxon.txt. Depending on the number of records, this can take a while. 14:12:33 ? Validating the core ID field taxonID is always present and unique. Note: the core ID field is required to link core records and extension records together. 14:12:33 taxon.txt does not have the core ID field taxonID. The data cannot be indexed on GBIF. 14:12:33 17 line(s) having a duplicate taxonID (please note comparisons are case insensitive) 14:12:33 Archive validation failed, because not every line has a unique taxonID (please note comparisons are case insensitive)

Ideally, the data validator should provide similar information to users as the IPT would to allow for users to correct their data without running it through the test IPT as well.

This issue relates to https://github.com/gbif/portal-feedback/issues/4416.


Github user: @CecSve User: See in registry - Send email System: Firefox 107.0.0 / Windows 10.0.0 Referer: https://www.gbif.org/tools/data-validator Window size: width 2560 - height 1287 API log&_a=(columns:!(_source),filters:!(),index:'3390a910-fcda-11ea-a9ab-4375f2a9d11c',interval:auto,query:(language:kuery,query:''),sort:!())) Site log&_a=(columns:!(_source),filters:!(),index:'5c73f360-fce3-11ea-a9ab-4375f2a9d11c',interval:auto,query:(language:kuery,query:''),sort:!())) System health at time of feedback: OPERATIONAL

CecSve commented 1 year ago

Update: the validator still fails to verify the archive format image but now the IPT manage to publish the archive with no issues. The validator seems to validate other DwC-A's so I am not sure how to troubleshoot the issue.

CecSve commented 1 year ago

Update: there was two issue fixes that made the validator work:

  1. the meta.xml file did not correspond to the files column headers
  2. core files and extensions contained two header rows (1 = verbatim value, 2 = IPT mapping value)