Open nh36 opened 7 years ago
380: if it chooses majority rule, as it does, you are missing something in the network. Remember, if only one sound is missing, or not connected to the other sounds, the algo stops.
151: if it says that w > v, the sound will be reconstructed back and reconstruct indeed a "w". This seems to be the case here.
187: again majority rule, the pattern is extremely gappy, so you can easily assume that there's just not enough data. As a rule, whenever you have a Ø in the consensus pattern (that is the one we get after impoutation of missing values), you can't really reconstruct from there, as any network will be disseparated (we won't reconstruct change from and to the Ø, right?)
We have two bad categories of consensus patterns:
Both types of patterns are not helping in reconstruction and should be avoided or banned during the process, but to bann them completely, we can't do without manual refinement, I'm afraid.
So the algo only works if the pattern is there for all 8 languages. I suppose the problem is that I do not sufficiently understand what the algo is doing. And maybe this will change when you start to use the networks for it. At the moment there are cases, like these, where I am not able to follow the thinking of the computer myself. I suppose you will be writing about this somewhere soon.
The algorithm is doing an extremely simple thing, which covers 10 lines in Python:
Whenever it returns False, it will fall back to majority rule. This is easy to see now, as it is marked. There are two reasons:
I recommend, if you want to check this fully, that you look at EACH of ALL of the patterns in the file and each time you extract the necessary changes. This would be a lot of work, but it would at least be an exhaustive graph. Ideally, in such a situation, you would also just give your expected outcome to the patterns (this could even be straightforward: annotating 500 patterns should be doable in under one hour, I guess). Then we had something to compare.
Right now, you are disappointed or surprised because certain points don't run as expected, but I think you underestimate to which degree there are strict decisions and most of the time it's just the data that fails.
I recommend to do it like this:
I can produce (but not this week) the list of possible sound changes. YOu could in the meantime start to just annotate all of the 500 (- singletons) patterns, to give us a more objective way to test.
In some cases the reconstruction given by the directed network is a surprise. In particular, ID 151 "the fog" is being reconstructed with a v- initial, whereas in the sound change specification file it says w > v, so this should be reconstructed as a v-.
Another case is ID 380 "the fish" where it should be reconstructing *ts- since I already specified that ts > t as an initial. So, It is not clear what is happening.
A third case is ID 187 "horizontal" where I expect *ḭ: to be reconstructed by the directed networks, but it isn't. We do have "n ḭ: ḭ" in the 'change direction' file.