cldf-clts / soundvectors

MIT License
1 stars 0 forks source link

CLTS Path must be passed differently #10

Closed LinguList closed 4 months ago

LinguList commented 4 months ago

We explicitly did not use constructs like the clts package inside the installed system, as we have now. So we should use the config system to account for this. I'll try to look into this later this week.

LinguList commented 4 months ago

@arubehn, another way would be to modify the code to work like this:

from pyclts import CLTS
from clts2vec import parse # would give this function a different name, maybe get_vec?

clts = CLTS("path2clts")

parse("t", clts=clts)

This is more explicit than hiding the CLTS package somewhere. It means we may have to provide a concrete path in the tests, but this is a minor problem compared to having this internally in the package.

arubehn commented 4 months ago

Sure, we can add the CLTS object explicitly to the method signature. But would that be the most elegant solution?

It is definitely not good to "hide" the path in the codebase; that was an oversight on my side. But how can we make sure that a user can smoothly specify their CLTS path for a whole project? I think some sort of config (defining the path for one central, shared CLTS object) would be better than passing around possibly different instances...

LinguList commented 4 months ago

We have solutions for this. And there is no way around that.

LinguList commented 4 months ago

The most basic solution is to use

cldfbench catconfig

If you specify your path once, you can then type:

from pyclts import CLTS

clts = CLTS()

This is possible, if you have installed pycldf and cldfbench and then ran the catconfig command. I think more convenient is not possible, if you compare with what you have to do for other packages in NLP ;-)

LinguList commented 4 months ago

But that means you don't even need CLTS in your code, you just have a variable clts=None, and you assume users use it. There is even a way to check if catconfig is available, I need to look this up from other projects, to implement it here.

arubehn commented 4 months ago

So essentially you suggest "outsourcing" the task of correctly setting up CLTS to the user?

LinguList commented 4 months ago

This "outsourcing" as you call it is the most consequent way to deal with plug ins. The library itself then works without specific dependencies, not even pyclts. The advantage is that it could in this form even be used with another library, say, FLTS, created by some nerd, that produces the same feature bundles out of a sound and parses it. This is much more powerful than to hard-code this one system into the library here.

arubehn commented 4 months ago

Alright, I see - that makes a lot of sense, of course :)