PhonologicalCorpusTools / SLPAA

5 stars 0 forks source link

Add sign-comparison analysis functionality #330

Open kchall opened 4 months ago

kchall commented 4 months ago

Allow a user to directly compare two signs for similarity, at varying levels of granularity. E.g.:

image

^ The user starts by seeing the top-level coding for each sign, colour coded as to whether it's a match or not. They can then drill down into the various areas to see which specific elements match or don't match:

image

The details about how elements are lined up with each other when they are not analogous will need to be worked out!

stannam commented 1 month ago

As suggested during the last meeting, now a node is painted pink if it is different but not totally different, i.e., pink when only a subset of its children does not match.

However, I am not sure about which colour to paint the node if its children are pink. For example, see H1.Mov1 of sign2 below.

Screenshot 2024-08-23 at 4 31 52 PM

H1.Mov1 is currently in red because a node can only look at its direct children and check whether they are coloured or not. Painting red makes sense because it would mean no children are white. However, it also makes sense to paint H1.Mov1 in pink if we want to reserve red for completely different cases only. I'm uncertain which of the two colours would be most appropriate.

kchall commented 1 month ago

@stannam Thanks! As I mentioned in the meeting, I'm actively thinking about this (indeed, that's what I'm doing right now, ha, so perfect timing) and will get back to you on it! :)

kchall commented 1 month ago

@stannam

Here are my current thoughts on colour coding to help with the appearance of “too many” mismatches.

  1. I think we need to distinguish between types of lines that can be compared. There are two dimensions that are relevant here: (1) is the line a ‘label’ or a ‘selection’? and (2) is the line something that could be expanded?

  2. By ‘label,’ I mean anything that is not actually something that a user has actively selected, but will show up in the comparison list, like “Sign Type” or “H1.Mov1,” etc. These are usually things that SLP-AA generates.

  3. By “selection,” I mean anything that is selected, either explicitly or implicitly by a user. These are all the actual choices a user makes, like “Circle” but also anything that is ‘inherited’ as a selection because of a lower-down selection. E.g. if a user selects “Circle,” then the software automatically nests that under “Movement type > perceptual shape > shape.” All of these should be considered selections. Similarly, if the user selects “Eyebrow-contra,” the system nests that under “Head > Face > Eye region > Eyebrow.” These should all be considered selections as well.

  4. With respect to expansion, I think it matters whether any given line hasn’t been expanded but could be on the comparison chart, or has already been expanded or couldn’t be (e.g. because it’s already the terminal node on the list). I’m hoping that it’s possible to colour-code a line based on whether it’s been expanded or not, but I don’t know for sure that that’s feasible — please let me know.

  5. Here’s how the colour coding should work, I think: a. if it's any line type and hasn't been expanded but could be, and everything under it does match, then the line is green b. if it's any line type and hasn't been expanded but could be, and something under it does not match, then the line is red c. if it's a label and has been expanded or can't be expanded, it has no colour because it's just a label, regardless of whether things underneath it match or not d. if it's a selection and has been expanded or can't be expanded, it's green if it matches and red if it doesn't match

Here’s an example (not for any particular actual pair of signs, just for illustration):

Start with nothing expanded:

image

Then, if we expand some things — note that here, the top-level labels change to no colour. Things under Location have all been expanded and match; things under Orientation have all been expanded but don’t match. The movement elements haven’t yet all been expanded, so it’s now the lowest element shown that is red:

image

Finally, if we completely expand the movement selections, we see that the red colour has migrated down to the actual lowest level, because that’s where the mismatch actually happens:

image

Hopefully, this will actually help us zoom in on the elements that are ‘truly’ matching or mismatching. However, there are some further notes below about cases where elements aren’t aligned with anything.


I also discovered that Yurika and I had laid out a really clever way of actually trying to align elements in signs (i.e. putting elements next to each other in the expansion tree), to handle all of the issues with respect to timing and x-slots, and trying to be maximally useful phonologically. This is instead of just matching elements by their labels. I need to think through still how to make this work for relation and non-manual modules, but here’s the basic idea:

The basic principle of alignment is to maximize similarity between signs. Note that I am proposing here that we NEVER align modules by virtue of their timing -- we align them based on other characteristics, and then check the matching of their timing.

  1. Align the “sign type” modules of each sign and compare them. There can only be one sign type per sign, so this is straightforward.

  2. Movement modules: i. First 'align by hand.' That is, try to align hand1 modules from sign1 to hand1 modules from sign2. If sign1 and sign2 each have only hand1 modules, or each have only hand2 modules, then proceed to the next step of alignment. If sign1 has both hand1 and hand2 modules, while sign2 has only hand1 modules (or vice versa), then align the hand1 modules only, and leave all hand2 modules unmatched. If sign1 has only hand1 modules, and sign2 has only hand2 modules, then this is the only time that non-matching hand modules can be aligned. ii. After aligning by hand as described above, try to align by movement type (perceptual shape, joint specific, or handshape change) -- e.g., if sign 1 has both perceptual shape movement and joint-specific movement, while sign 2 has only joint-specific movement, align the two joint-specific movements, and then say that sign 1 has an extra perceptual shape movement that doesn’t have a match. If both signs have two of the same type of movements (e.g. two perceptual shapes), move down to the top-most characteristic (e.g., what the perceptual shape or the joint-specific movement is, like 'straight' or 'close/open'), and align ones that match at that level. If things can't be aligned based on any of the above, align by coding order (e.g. align sign 1’s H1.Mov3 with sign 2’s H1.Mov3, regardless of content).

  3. Location modules: i. First ‘align by hand’ as for the movement module. ii. After aligning by hand, try to align by general location type (body-anchored or signing space) -- e.g., if sign1 has both body-anchored and signing space locations, and sign2 has only a body-anchored location, then align the body-anchored modules and leave the signing-space module unmatched. If there are multiple locations of the same type (e.g., multiple body-anchored locations), use the uppermost (in the tree) location specifications to align (e.g., align two head-locations rather than a head location with a torso location, if possible). If modules still can't be aligned, align by coding order.

  4. Absolute orientation modules: i. First ‘align by hand’ as for the movement module. ii. After aligning by hand, align by palm orientation if possible (e.g. align two palm-up modules). If not possible, align by finger root direction. If still not possible, align by coding order.

  5. Hand configuration modules: i. First ‘align by hand’ as for the movement module. ii. After aligning by hand, align by handshape name if possible (e.g. align two '5' handshapes). If not possible, align by coding order. [NB: this one might eventually need to get refined.]

  6. Relation modules: I need to figure these out!

  7. Non-manual modules: I need to figure these out!

Once the modules are aligned, they are compared, using the principles stated above. Comparisons can only go down as far as two modules match. So e.g. in sign type, if sign1 is a 1H sign, and sign2 is a 2H sign, nothing is compared past 'number of hands'. But if both are 1H signs, then the comparison goes on to see whether 'the hand moves' in each sign or not, etc. Anything that is not aligned with something on the other sign should be coloured some new colour (yellow?).

So for example, if we also include the movement plane for our two imaginary signs above, then we might get the following. Once we get to the difference in the plane selection, it’s not fair to try to compare selections below that, so everything below it is yellow:

image

We can also use yellow for anything that just doesn’t have an alignment at all, e.g. in the case of a sign with two movement modules being compared to a sign with one movement module:

image

Now, how do we deal with the timing? Instead of using it to align elements, we simply compare already aligned elements based on their timing! This will involve some ‘invention’ of new lines in the comparison tree, that are labelled with the right elements.

  1. When comparing timing of any two aligned modules, first examine whether each module is linked to the “whole sign” or not the whole sign (regardless of the number of x-slots). Label = “Whole sign” i. If one is linked to the whole sign (a “yes” selection) and the other is linked not to the whole sign (a “no” selection), then the two are mismatched (red) and the comparison stops there. ii. If both are linked to the full sign (both “yes”), then they match (green) and then the number of x-slots in each sign is compared directly (e.g. if each sign has 2 x-slots, then they match at this level; if one has one x-slot and the other has two, then they mismatch at this level). Either way, comparison stops here. iii. If neither is linked to the full sign (both “no”), then they match (green) at this level, and timing comparison moves to step (2).

  2. Check to see the total number of x-slots that each module is associated with. Label = “Number of x-slots” (Note: this doesn’t mean that the module has to be actually an interval that lasts that number of x-slots, just counting the number of x-slots it’s associated to). E.g. it could be 1, 1.5, 2, 3.333, etc. i. If the total number of x-slots that each sign is associated to is different (e.g., one is associated with two x-slots and one is associated to only a half x-slot), then the two mismatch (red), and the comparison stops there. ii. If the total number of x-slots that each sign is associated to is the same, then the two match (green) and timing comparison moves on to step (3).

  3. Check to see whether each module is (a) associated to a single point; (b) associated to a single interval; (c) associated to multiple points; (d) associated to multiple intervals; or (e) associated to both points and intervals. Label = “Type of association.” i. if the two modules have different types of associations (e.g. (a) and (c), or (a) and (b)), then the two mismatch (red) and the comparison stops there. ii. if the two modules have the same type of association (e.g. (a) and (a), or (b) and (b)), then they match (green) and timing comparison moves on to step (4).

  4. This step depends on which type of match the two modules have: i. For (a) & (a) (single points): Compare the number of the point directly (Label = “which point”) (e.g. '1/3 of x1' -- does this match or not match the point of the other sign's module? Don't try to break into subcomponents like '1/3' or 'x1', etc.). If they match exactly, they’re green; if there’s any mismatch, they’re red. Either way, timing comparison stops here. ii. For (b) and (b) (single intervals): Compare the number of the interval directly (Label = “which interval”) (e.g., 'second half of x2') -- does this match or not match the interval of the other sign's module? Don't try to break into subcomponents. If they match exactly, they’re green; if there’s any mismatch, they’re red. Either way, timing comparison stops here. iii. For (c) and (c) (multiple points): First, compare the number of points (Label = “number of points”) (e.g., ‘three points') -- does this match or not match the number of points of the other sign's module? If the number doesn’t match, they’re red, and the comparison stops here. If the number matches, they’re green. In this case, keep going: list each point consecutively and compare each pairwise (Label = “which point”) (e.g., if the first point of sign1 is '1/3 of x1' -- does this match or not match the first point of sign2's module?) — Have a red/green comparison for each point. The timing comparison stops after this. iv. For (d) and (d): multiple intervals: First, compare the number of intervals (Label = “number of intervals”) (e.g., 'three intervals’) -- does this match or not match the number of intervals of the other sign's module? If the number doesn’t match, they’re red, and the comparison stops here. If the number matches, they’re green. In this case, keep going: list each interval consecutively (Label = “which interval”) and compare each pairwise (e.g., if the first interval of sign1 is 'second half of x1' -- does this match or not match the first interval of sign2's module?) — Have a red/green comparison for each interval. The timing comparison stops after this. v. For (e) and (e): First, count the number of points and intervals separately (e.g., 1 point and 2 intervals). Do the modules match on the number of points? (Label = “number of points”) Do the modules match on the number of intervals? (Label = “number of intervals”) These two questions can be answered separately. If either one mismatches, it’s red; if it matches, it’s green. For either of them that is green, continue as for (c) / (c) or (d) / (d) matches, i.e. compare each point or interval consecutively. Timing comparison stops after all comparisons have been made.

Here’s an example — see how the labels for timing have been added in:

image
stannam commented 3 weeks ago

Question about expanding a line. When the user expands a line, the corresponding line in the other tree should be programmatically expanded, because colouring depends on whether the two lines are expanded and whether they agree. Then, should expanding 'Shape' in Tree1 also expand a line under 'Perceptual shape' in Tree2? If so, which one? Sign1

image

Sign2

image

I imagine a case like the one below where the children don't match under 'Absolute.' But I also think that 'Shape' in Tree1 and 'Axis direction' and 'Plane' in Tree2 should all be in yellow and expand/collapse independently? image

kchall commented 3 weeks ago

@stannam Great point. I think that ideally, if things simply don't correspond between signs, then the sign that doesn't have a specification where the other sign does gets a blank line. So, you're totally right, the lines all end up yellow and expanding / collapsing, but if there's no specification on one side, it expands / collapses blank lines.

I guess my first preference would be that the blank ones are truly blank:

image

...but if that's not feasible, I would be totally happy for them to also be yellow, where yellow just always means 'these things don't match and can't be fairly compared':

image

To go back to the circle / zigzag case, it would look like this:

image