The Dataset.stats derives the number of rows in a table in two ways, either just reading the table's property dc:extent - if specified - or actually counting the rows (see https://github.com/cldf/pycldf/blob/874275ad6895c17a2c1375450edbb915ff001f28/src/pycldf/dataset.py#L520). This can be confusing, e.g. when a StructureDataset and a Wordlist are created via pylexibank and share the LanguageTable. Then the Wordlist will have dc:extent pruned to the languages actually referenced in forms, while the StructureDataset will not.
Thus, this counting behaviour should be specified by the caller explicitly, and be available as option in the cldf stats command.
The
Dataset.stats
derives the number of rows in a table in two ways, either just reading the table's propertydc:extent
- if specified - or actually counting the rows (see https://github.com/cldf/pycldf/blob/874275ad6895c17a2c1375450edbb915ff001f28/src/pycldf/dataset.py#L520). This can be confusing, e.g. when aStructureDataset
and aWordlist
are created viapylexibank
and share theLanguageTable
. Then theWordlist
will havedc:extent
pruned to the languages actually referenced in forms, while theStructureDataset
will not.Thus, this counting behaviour should be specified by the caller explicitly, and be available as option in the
cldf stats
command.