surprizes in directed networks

nh36 commented 7 years ago

In some cases the reconstruction given by the directed network is a surprise. In particular, ID 151 "the fog" is being reconstructed with a v- initial, whereas in the sound change specification file it says w > v, so this should be reconstructed as a v-.

Another case is ID 380 "the fish" where it should be reconstructing *ts- since I already specified that ts > t as an initial. So, It is not clear what is happening.

A third case is ID 187 "horizontal" where I expect *ḭ: to be reconstructed by the directed networks, but it isn't. We do have "n ḭ: ḭ" in the 'change direction' file.

LinguList commented 7 years ago

380: if it chooses majority rule, as it does, you are missing something in the network. Remember, if only one sound is missing, or not connected to the other sounds, the algo stops.
151: if it says that w > v, the sound will be reconstructed back and reconstruct indeed a "w". This seems to be the case here.
187: again majority rule, the pattern is extremely gappy, so you can easily assume that there's just not enough data. As a rule, whenever you have a Ø in the consensus pattern (that is the one we get after impoutation of missing values), you can't really reconstruct from there, as any network will be disseparated (we won't reconstruct change from and to the Ø, right?)

We have two bad categories of consensus patterns:

singletons, which do only occur once
gappy patterns in which we have spots for which we don't find evidence (many or at least one Ø)

Both types of patterns are not helping in reconstruction and should be avoided or banned during the process, but to bann them completely, we can't do without manual refinement, I'm afraid.

nh36 commented 7 years ago

So the algo only works if the pattern is there for all 8 languages. I suppose the problem is that I do not sufficiently understand what the algo is doing. And maybe this will change when you start to use the networks for it. At the moment there are cases, like these, where I am not able to follow the thinking of the computer myself. I suppose you will be writing about this somewhere soon.

LinguList commented 7 years ago

The algorithm is doing an extremely simple thing, which covers 10 lines in Python:

take the reflexes of a pattern
take the subgraph that has these reflexes (only those nodes)
if the subgraph is not connected, return False
if the subgraph is connected, search for a source or multiple sources and return them
if no source can be found (because it's a circle), also return False

Whenever it returns False, it will fall back to majority rule. This is easy to see now, as it is marked. There are two reasons:

the subgraph is not connected (your network doesn't cover the transition, problem of writing networks not independently, but based on the output)
the subgraph does not have a source

I recommend, if you want to check this fully, that you look at EACH of ALL of the patterns in the file and each time you extract the necessary changes. This would be a lot of work, but it would at least be an exhaustive graph. Ideally, in such a situation, you would also just give your expected outcome to the patterns (this could even be straightforward: annotating 500 patterns should be doable in under one hour, I guess). Then we had something to compare.

Right now, you are disappointed or surprised because certain points don't run as expected, but I think you underestimate to which degree there are strict decisions and most of the time it's just the data that fails.

LinguList commented 7 years ago

I recommend to do it like this:

annotate all patterns with expected outcome manually (I'd anyway be keen on having this), and please annotate as well those cases where we have sounds which are NOT attested, as this is what we'll anyway need!
discard patterns which occur only one time (this will make this really fast).
write a list of all potential transitions between sounds (both directions), which is restricted by sounds which occur in at least one pattern (this will prevent vowels from matching with consonants).
discard impossible sound changes from this exhaustive list.
testing this new list on the data.

I can produce (but not this week) the list of possible sound changes. YOu could in the meantime start to just annotate all of the 500 (- singletons) patterns, to give us a more objective way to test.

digling / burmish

surprizes in directed networks #110