Closed martino-vic closed 2 years ago
I need more context: the forms.csv
should have an extra column of
which format? Can you provide a minimal example with CSV header and one
line, where I see what you want to count how?
Ah, yes of course, sorry:
The rows of the column should contain integers that indicate how often each prosodic structure occurs in the entire column. It should look something like this for example:
Segments ProsodicStructure Frequency
k i k i CVCV 2
b u b a CVCV 2
w u g CVC 1
meaning that "CVCV" occurs 2 times in our data and "CVC" 1 time.
Some background info why I need this: I want to know which prosodic structures are documented (i.e. certainly allowed) in the recipient language, so when predicting loanword adaptation there's an option to filter out words with an undocumented prosodic structure. The hook is that sometimes there are some untypical structures in the data, that just occur in few words, but are otherwise not allowed. If we know their frequencies, it's possible to inspect the data manually and decide how frequent a structure should be to make it part of the inventory of prosodic structures of the recipient language. I hope this explanation makes some sense
I see your point. This requires a tweak that I am not really happy to add there: you would have to segment the data outside of the loop that adds the form args.writer.add_form...
in order to get these numbers. In my opinion, although this is something one could count here already, it is something that is perfectly done from within loanpy. LingPy also counts all segments and checks how often they occur (with a defaultdict, even no counter) to use this to smooth the data later on (words occurring twice do not contribute to the correspondence patterns). So my suggestion: do it in loanpy.
Ah I see, yes that does make sense
I'm trying to add a column to forms.csv that counts how often each prosodic structure occurs in total in column "ProsodicStructure" in forms.csv and I can't figure out how to add this to the lexibank script. It's similar to this issue: I can't add the info from within the loop because I can count the number of occurrences only after the loop has ended. But somehow I don't manage to start a second loop at the bottom where I insert this info. Is there some kind of workaround for this @LinguList ?