dictionaria / pydictionaria

Apache License 2.0
3 stars 0 forks source link

CLDF conversion does not filter rows with missing required columns #2

Closed xrotwang closed 5 years ago

xrotwang commented 5 years ago

The CLDF conversion should make sure to only create CLDF which validates without warning. Otherwise, pycldf cannot be used to load the data.

$ cldf validate submissions-internal/tseltal/processed/cldf-md.json 
WARNING submissions-internal/tseltal/processed/entries.csv:2:3 Headword: required column value is missing
WARNING submissions-internal/tseltal/processed/entries.csv:2:3 Headword: required column value is missing
WARNING submissions-internal/tseltal/processed/senses.csv:1111:2 Description: required column value is missing
WARNING submissions-internal/tseltal/processed/senses.csv:1111:2 Description: required column value is missing
WARNING submissions-internal/tseltal/processed/entries.csv:2:3 Headword: required column value is missing
WARNING submissions-internal/tseltal/processed/senses.csv:1111:2 Description: required column value is missing
xrotwang commented 5 years ago

This is also a problem for Nen:

$ cldf validate nen/processed/cldf-md.json 2>&1 | sort | un
iq | grep Headword | wc -l
67
xrotwang commented 5 years ago

For some dicts, e.g. Kamang, it might suffice to fall back to using \ge as sense Description if there is no \de.