The GBIF data validator CSV parser does not properly respect quoting rules and produces many incorrect reports of column mismatch, missing fields, etc.
For example, here is a row which is marked by the validator has having incorrect structure:
196359,,,826,,,,Jackson Chu,,1,,,,,,PRESENT,,,,Iophon,,"Chu JWF, Leys SP (2010) High resolution mapping of community structure in three glass sponge reefs (Porifera, Hexactinellida). Marine Ecology Progress Series 417: 97‑113. https://doi.org/10.7939/r36k3q",,,iNaturalist:196359,,,,,,,,Iophon sp.,,,,,,,Animalia,Porifera,Demospongiae,Poecilosclerida,Acarnidae,Iophon,,,,,,,,,,,,,,,,,,Galiano Island,Canada,CA,British Columbia,,,,,,,,93.58899689,,,,,,,,,,,,48.91363673,-123.3305997,,,,,,,,Jackson Chu,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,ROV,,2008-05-21,,,,2008,5,21,,,Waypoint.or.Transect: 31,,,,,,,,,"Chu JWF, Leys SP (2010) High resolution mapping of community structure in three glass sponge reefs (Porifera, Hexactinellida). Marine Ecology Progress Series 417: 97‑113. https://doi.org/10.7939/r36k3q",,,,,,Chu & Leys (2010),,HumanObservation,,,,
Github user: @amb26
User: See in registry
System: Chrome 90.0.4430 / Windows 7.0.0
Referer: https://www.gbif.org/tools/data-validator/1622015285813
Window size: width 1843 - height 1437
API log&_a=(columns:!(_source),filters:!(),index:'3390a910-fcda-11ea-a9ab-4375f2a9d11c',interval:auto,query:(language:kuery,query:''),sort:!()))
Site log&_a=(columns:!(_source),filters:!(),index:'5c73f360-fce3-11ea-a9ab-4375f2a9d11c',interval:auto,query:(language:kuery,query:''),sort:!()))
System health at time of feedback: OPERATIONAL
If the validator is not able to properly parse files with text fields that are double quotation delimited, then that is clearly a bug.
@gbif/informatics
GBIF data validator CSV parser is faulty
The GBIF data validator CSV parser does not properly respect quoting rules and produces many incorrect reports of column mismatch, missing fields, etc.
This format is defined at
https://datatracker.ietf.org/doc/html/rfc4180
For example, here is a row which is marked by the validator has having incorrect structure:
Github user: @amb26 User: See in registry System: Chrome 90.0.4430 / Windows 7.0.0 Referer: https://www.gbif.org/tools/data-validator/1622015285813 Window size: width 1843 - height 1437 API log&_a=(columns:!(_source),filters:!(),index:'3390a910-fcda-11ea-a9ab-4375f2a9d11c',interval:auto,query:(language:kuery,query:''),sort:!())) Site log&_a=(columns:!(_source),filters:!(),index:'5c73f360-fce3-11ea-a9ab-4375f2a9d11c',interval:auto,query:(language:kuery,query:''),sort:!())) System health at time of feedback: OPERATIONAL