cldf / pycldf

python package to read and write CLDF datasets
https://cldf.clld.org
Apache License 2.0
15 stars 7 forks source link

cldf stats output may be misleading #98

Closed xrotwang closed 5 years ago

xrotwang commented 5 years ago

The Dataset.stats derives the number of rows in a table in two ways, either just reading the table's property dc:extent - if specified - or actually counting the rows (see https://github.com/cldf/pycldf/blob/874275ad6895c17a2c1375450edbb915ff001f28/src/pycldf/dataset.py#L520). This can be confusing, e.g. when a StructureDataset and a Wordlist are created via pylexibank and share the LanguageTable. Then the Wordlist will have dc:extent pruned to the languages actually referenced in forms, while the StructureDataset will not.

Thus, this counting behaviour should be specified by the caller explicitly, and be available as option in the cldf stats command.