cldf / pycldf

python package to read and write CLDF datasets
https://cldf.clld.org
Apache License 2.0
15 stars 7 forks source link

dataset.objects relations lazyness should be documented #139

Closed Anaphory closed 3 years ago

Anaphory commented 3 years ago

I tried the new orm API. It's nice for not having to juggle explicit column names to access languages etc., but I'm learning how to do that more efficiently, so I might stick to my own purpose-built caches which I can locally control and discard when data changes. I may show you my use cases when I have enough of them to warrant some generalization, so we can see whether pycldf can learn from them.

For now, I encountered

$ python
Python 3.8.7 (default, Jan 12 2021, 09:39:22) 
[GCC 10.2.0] on linux
Type "help", "copyright", "credits" or "license" for more information.
>>> import pycldf
>>> ds = pycldf.Wordlist.from_metadata("Wordlist-metadata.json")
>>> forms = ds.objects("FormTable")
>>> forms[0].language
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/home/gereon/.local/etc/lexedata/lib/python3.8/site-packages/pycldf/orm.py", line 155, in language
    return self.related('languageReference')
  File "/home/gereon/.local/etc/lexedata/lib/python3.8/site-packages/pycldf/orm.py", line 137, in related
    return self.dataset.get_object(TERMS[relation].references, fk)
  File "/home/gereon/.local/etc/lexedata/lib/python3.8/site-packages/pycldf/dataset.py", line 782, in get_object
    self.objects(table, cls=cls)
  File "/home/gereon/.local/etc/lexedata/lib/python3.8/site-packages/pycldf/dataset.py", line 774, in objects
    for item in self[table]:
  File "/home/gereon/.local/etc/lexedata/lib/python3.8/site-packages/pycldf/dataset.py", line 636, in __getitem__
    raise KeyError(table)
KeyError: 'LanguageTable'
>>> 

It is very clear to me that this would happen (and it speaks to your foresight that I could guess this interface without actually reading the ORM documentation first, shame on me), but I think the limitation that related tables must be cached before attribute access should be removed or mentioned in

https://github.com/cldf/pycldf/blob/afaca42c4ae31b0618ccde3a509d74ad11e2b062/src/pycldf/orm.py#L25-L36

xrotwang commented 3 years ago

Hm. It seems something else is going on here. This works for me:

>>> from pycldf import Dataset
>>> wals2020 = Dataset.from_metadata('https://raw.githubusercontent.com/cldf-datasets/wals/v2020/cldf/StructureDataset-metadata.json')
>>> values = wals2020.objects('ValueTable')
>>> values[0].language
<pycldf.orm.Language id="aab">
xrotwang commented 3 years ago

From the error, it looks like your dataset does not have a LanguageTable.

xrotwang commented 3 years ago

Btw. I'm in the process of creating API docs for pycldf with sphinx on rtd.org. So, hopefully things will become simpler in the future.

Anaphory commented 3 years ago

Argh, yes. Dumb error. I copied the forms.csv to do some experiments with free metadata – including automatically inventing metadata, that's why this worked at all – and forgot to change back to the main directory! :man_facepalming:

And yes, that error message could vaguely have come from a cache miss, but in principle it is clear enough for the point it refers to in the traceback.

xrotwang commented 3 years ago

@Anaphory what do you think of this: https://pycldf.readthedocs.io/en/latest/ helpful?

xrotwang commented 3 years ago

oops, wait, the autodoc stuff didn't work

xrotwang commented 3 years ago

now it's up