cldf-clts / soundvectors

MIT License
1 stars 0 forks source link

Resources copies CLTS, but CLTS is part of the Code? #6

Closed LinguList closed 4 months ago

LinguList commented 7 months ago

I do not understand why the folder resources copies all CLTS sounds, given that they are part of the CLTS repo anyway, and this repo is used by clts2vec?

LinguList commented 7 months ago

I suggest anyway to please add a folder examples and just include eval.py there. Add a README as well, so I can see what it is doing and supposed to do.

LinguList commented 7 months ago

Please check the following for accessing data in CLTS:

from pyclts import CLTS
clts = CLTS()
print(clts.repos / "?")
LinguList commented 7 months ago

clts.repos is a posix-path that leads to the CLTS folder, so you have access to the data in this folder, to the file that is copied as sounds.tsv from the CLTS repo.

LinguList commented 7 months ago

So copying data is not needed, just load it:

from csvw.dsv import UnicodeDictReader
from pyclts import CLTS

clts = CLTS() # assumes your repos is fixed with `cldf catconfig`
with UnicodeDictReader(clts.repos / "data" / "sounds.tsv", delimiter="\t") as reader:
    data = [row for row in reader]
arubehn commented 7 months ago

So copying data is not needed, just load it:

Thank you, I didn't know that was possible. Just for clarification, sounds.tsv was only used for evaluation, though - the core package does not rely on it.

add a README

Will do.

LinguList commented 7 months ago

Yes, in any case, loading data from copied files, when it is not needed, should be avoided ;-)