cldf-clts / clts-legacy

Cross-Linguistic Transcription Systems
Apache License 2.0
4 stars 3 forks source link

profile loading data procedure to find out what's slowing it down #29

Closed LinguList closed 6 years ago

LinguList commented 6 years ago

access is rather slow now since we load a couple of different files. Having a simple cache like we have in lingpy, would probably make dealing with pyclts easier. The alternative is probably to just dump the object to json, as it seems that it is well serializable.

xrotwang commented 6 years ago

We should definitely profile this issue and try other options. I'm somewhat afraid of introducing caches too quickly.

LinguList commented 6 years ago

I just figured, it is amazingly fast, but what slows us down is that there are 5000 diphtongs. So this file takes a long time to load, although this surprises me a little bit as it is only 600k, with 5000 lines, so not really scary, but I assume the initialization of the classes also costs some time.

We'll never need all of those 5000 sound combinations, but given that I STILL find examples in the data, where a dipthong (eg from phoible) is not in our current lookup, this shows that we should also assume the worst, which is why I created all 5000 now by just combining what vowels we have.

So yes, no need to cache, but need to profile and to decide what to do with 5000 sounds we keep in order to be prepared for the one weird thing linguists come up with.

xrotwang commented 6 years ago

We need to profile. There shouldn't be much difference between a couple hundred sounds and 5000.

LinguList commented 6 years ago

since we now handle diphthongs and clusters as ComplexSound, the loading procedure is much faster, so this is no longer an issue.