Closed cschloer closed 3 years ago
We talked about having a test that would check values in a sci_name column for abbreviated taxon names like first letter genus, then period then species name like G. morhua
. We would suggest using the full genus like Gadus morhua
.
We would want to flag any sci_names that match ^\w\.
Note that these are actually good and we don't want to flag them: Gadus sp. Gadus spp. https://regex101.com/r/8eFGXw/1/
Closing it for now, as we decided to separate dataflows/goodtables logic
I had started a python notebook in Colab that installed the dataflows commit we wanted to test (from https://github.com/datahq/dataflows/pull/146) and loaded some test data for this issue from our frictionless-usecases repo that has "bad" names we want to check. Didn't get to testing validate_metadata which is probably good because it isn't being further developed.
I'm linking here in case we want to modify this to do goodtables testing or whatever implementation. The link to the data, and basic flow is there. https://gist.github.com/adyork/9ae791ebee7b0b651be034ec1b033c18#file-test-field-name-validation-ipynb
load('https://github.com/BCODMO/frictionless-usecases/raw/master/usecases/818993_seabirdCTD/orig/head/FK190211_CTD004_01032019.csv', format='csv', ),
https://github.com/datahq/dataflows/issues/142
python --version
)