clulab / eidos

Machine reading system for World Modelers
Apache License 2.0
36 stars 24 forks source link

Compositional grounding is not composing #1115

Open kwalcock opened 2 years ago

kwalcock commented 2 years ago

I wonder whether at

https://github.com/clulab/eidos/blob/8758be8a5860fc3404f1d6f66b5095a161a6d87c/src/main/scala/org/clulab/wm/eidos/groundings/grounders/SRLCompositionalGrounder.scala#L285

the exact and inexact groundings should not be combined in a sequence but rather into a combined predicate grounding.

MihaiSurdeanu commented 2 years ago

The sequence allows for the following sort to happen, no?

MihaiSurdeanu commented 2 years ago

But the bigger question is in the title... Do we know where compositions are lost?

kwalcock commented 2 years ago

The sequence only contains compositional groundings that aren't really composed. I'm thinking of something like https://github.com/clulab/eidos/blob/621ca469f45c10ee015504c5ae6466b801f45a8c/src/main/scala/org/clulab/wm/eidos/groundings/grounders/SRLCompositionalGrounder.scala#L343-L347

MihaiSurdeanu commented 2 years ago

Ok... But where are these combined with the concepts that serve as arguments to these predicates?

kwalcock commented 2 years ago

In order to collect the inexactPredicateGroundings there are variables like isArg and isPred. If there are predicates, this seems like the only place they would be composed.

Pseudo call stack:

groundEidosMention
groundSentenceSpan
groundSentenceSpan
if (validPredicates.isEmpty) groundWithoutPredicates(sentenceHelper)
else groundWithPredicates(sentenceHelper)
  groundWithPredicates
  findExactPredicateGroundingAndRange
  findInexactPredicateGroundings
MihaiSurdeanu commented 2 years ago

Ok, thanks!

Can you please try to find where the concepts get lost? For example, for the phrase "transportation of oil", where do we lose the grounding for "oil"?

kwalcock commented 2 years ago

The program is running, but IIRC oil is grounded to fuel and is not an exact match. Transportation is not an exact match, either. Not sure why. Inexact matches are not combined.

kwalcock commented 2 years ago

Ben's "transportation of water" works, almost. The scores don't sort well. The noisyOr is only applied to filled slots, so filling a second slot with a lower value is not as good as having the first single slot.

THEME PROCESS: wm/process/transportation/ (1.2696835) Total: 1.2596835
THEME: wm/concept/goods/water (1.0) THEME PROCESS: wm/process/transportation/ (1.2696835) Total: 1.0025969
THEME: wm/concept/goods/water (1.0) Total: 0.99
MihaiSurdeanu commented 2 years ago

Sure. I think most matches in the real world will be inexact due to language sparsity. But we should assemble predicates and their arguments regardless of exact/inexact match.

MihaiSurdeanu commented 2 years ago

Ben's "transportation of water" works, almost. The scores don't sort well. The noisyOr is only applied to filled slots, so filling a second slot with a lower value is not as good as having the first single slot.

THEME PROCESS: wm/process/transportation/ (1.2696835) Total: 1.2596835
THEME: wm/concept/goods/water (1.0) THEME PROCESS: wm/process/transportation/ (1.2696835) Total: 1.0025969
THEME: wm/concept/goods/water (1.0) Total: 0.99

This is pretty nice! Then what did Ben see? Is there a bug?

kwalcock commented 2 years ago

They (water and transportation) weren't combined before. They were separate compositional groundings in the Seq that got sorted.

MihaiSurdeanu commented 2 years ago

I see. Awesome!

kwalcock commented 2 years ago

When there is a predicate, as is the case with both "(price) of oil" and "water (transportation)", the findExactPredicateGroundingAndRange does not necessarily find an exact match at all or may find an exact match which doesn't include the predicate, I should have called it be called findExactGroundingAndRangeIfThereIsAPredicateSomewhere. If it doesn't find an exact match as is the case with "price of oil", perhaps it should specifically target the predicate, make do with whatever it can find, and then go on to look for possible arguments, etc. In this particular case, it looks like the ontology doesn't help by having a node called price_or_cost. It would be better to pick one or the other and move the extra to an example. For "water transportation", water matches first, even though it isn't the predicate, but it is possible to do them in the other order, apparently.

kwalcock commented 2 years ago

I hope that @zupon is getting notifications in case he notices that I'm off track.

zupon commented 2 years ago

I have been getting these notifications, and it looks like you might be right on track! At one point, we did want to try to match the entire mention span even when there were predicates. That would get us stuff like "climate change", where it's sorta compositional but we also just have a climate_change concept. Since we wanted the matching for things like this to be fuzzy, we then did the non-exact matching. This is useful when we have mentions that don't exactly match our ontology, but it sounds like it's now a problem whereby if we do get some non-exact match this way we just stop and ignore the rest, instead of continuing with the remaining content.

I can't look at this tonight, but I will try to take a look tomorrow. But I think this is looking in the right place.

kwalcock commented 2 years ago

Right now it's failing one NOT test by getting extra things, but it is probably at least composing. If I turn the failingTests to passingTests, it gets 747 passing while master gets 771. I had added some things like multiple exact matches. Maybe that hurt it. I don't have a feel for the data.

MihaiSurdeanu commented 2 years ago

Thank you both! I think we should enable this change, which looks like it's improving things a lot.

Also, @EgoLaparra: would it be possible for you to look at the 771 - 747 = 24 tests that fail now and try to see if we can fix them? Thank you all!

kwalcock commented 2 years ago

I'm probably going to make a few more changes around https://github.com/clulab/eidos/blob/472520f1e41967036a715eb0ff6fdd7bcdcb207a/src/main/scala/org/clulab/wm/eidos/groundings/grounders/EidosOntologyGrounder.scala#L137-L145 within a couple of hours. Those lines did not survive the night.

EgoLaparra commented 2 years ago

@kwalcock let me know when you are done with those changes and I will take a look at the failing tests.

kwalcock commented 2 years ago

@EgoLaparra, I made the small change.

MihaiSurdeanu commented 2 years ago

Thank you both!

On Sat, Feb 26, 2022 at 12:49 PM Keith Alcock @.***> wrote:

@EgoLaparra https://github.com/EgoLaparra, I made the small change.

— Reply to this email directly, view it on GitHub https://github.com/clulab/eidos/issues/1115#issuecomment-1052522550, or unsubscribe https://github.com/notifications/unsubscribe-auth/AAI75TQ33OUT4JNEXZ2MZ53U5EVEDANCNFSM5PK663SQ . Triage notifications on the go with GitHub Mobile for iOS https://apps.apple.com/app/apple-store/id1477376905?ct=notification-email&mt=8&pt=524675 or Android https://play.google.com/store/apps/details?id=com.github.android&referrer=utm_campaign%3Dnotification-email%26utm_medium%3Demail%26utm_source%3Dgithub.

You are receiving this because you commented.Message ID: @.***>

EgoLaparra commented 2 years ago

Almost all those tests seem to fail because composed groundings are scored higher than non-composed ones, even when the expected grounding is non-composed.

For example, for the cause mention high unemployment the expected grounding is Seq("wm/concept/economy/unemployment", "", "", "") but the predictions is Seq(wm/concept/economy/unemployment, wm/property/quality, , ,). The top 3 groundings produced by the algorithm are:

Seq(wm/concept/economy/unemployment, wm/property/quality, , , 0.99481976)
Seq(wm/concept/economy/unemployment, , , , 0.99)
Seq(, wm/property/quality, , , 0.4819764)

Another example: the prediction for Increasing tensions is Seq(wm/concept/crisis_or_disaster/conflict/tension, wm/property/price_or_cost, ,) while the expected grounding is Seq("wm/concept/crisis_or_disaster/conflict/tension", "", "", ""). The top 3:

Seq(wm/concept/crisis_or_disaster/conflict/tension, wm/property/price_or_cost, , , 0.9948348)
Seq(wm/concept/crisis_or_disaster/conflict/tension, , , , 0.99)
Seq(, wm/property/price_or_cost, , , 0.48348033)

Is this behavior (i.e., composed scores > non-composed scores) expected?

kwalcock commented 2 years ago

There is in SRLCompositionalGrounder.scala a calculation of score in which the properties are multiplied by 0.5f before they are noisyOr'd. Changing that value might switch how these values sort.

MihaiSurdeanu commented 2 years ago

I think it's worth tuning the score that Keith mentioned. Also, the first compositional example: Seq(wm/concept/economy/unemployment, wm/property/quality, , ,) seems correct to me. However, the second one is clearly wrong... Any other ideas on how to fix these? @zupon, you did look at situations such as the second one, right? In this case, I think "tensions" is consumed by the concept, and "increasing" is a magnitude adjective and it should not be used or grounding. Thus, there should be no tokens left for grounding the phantom property "price_or_cost"... You did keep track of which words were consumed during grounding, right @zupon ?

zupon commented 2 years ago

For both the high unemployment and Increasing tensions examples, are there predicates? I can sorta see the link from high --> quality, but I'm not seeing where Increasing --> price would make sense.

What should happen if there are predicates is (1) the algorithm looks first at the entire span (e.g. high unemployment) to see if it matches a node or example exactly (for various meanings of "exactly"); (2) if it doesn't, we then take each predicate or argument individually and then ground it. When we match something with the fuzzy exact match, we do (or are supposed to, anyway) remove those tokens that matched from our subsequent grounding attempts. For example, if we had the span climate change regulations, our exact match might return the climate_change Concept. After the match, we remove those two tokens and are only left with regulations to ground.

What may be happening here is the (fuzzy) exact matching for the span is accidentally allowing the entire Increasing tensions span to check against the Property branch using w2v rather than only using regex like we do elsewhere.

@MihaiSurdeanu AFAIK we don't actually keep track of the tokens used for each slot in the predicate grounding in a way that we can refer back to them later, but the algorithm does (or should) remove tokens that are used for grounding, like with the exact match case for predicates/arguments.

kwalcock commented 2 years ago

There is this comment in SRLCompositionalGrounder:

      // TODO (at some point) -- the empty sequence here is a placeholder for increase/decrease triggers
      //  Currently we don't have "access" to those here, but that could be changed

On the other hand, there are in the ontology words like "higher". One is in the ontology node higher_temperatures and there are others in the examples. To keep them in the text to be grounded might help in these cases.

kwalcock commented 2 years ago

Here's the node for price_or_cost, which has some hints of increase. It wouldn't have to be that way.

                - node:
                    name: price_or_cost
                    patterns:
                        - \b([Cc]osts?|[Pp]rices?)\b
                    examples:
                        - additional costs
                        - capital costs
                        - cost
                        - costs
                        - expenses
                        - hidden costs
                        - high costs
                        - higher costs
                        - indirect costs
                        - input costs
                        - operating costs
                        - opportunity costs
                        - price
                        - recurrent costs
                        - social costs
EgoLaparra commented 2 years ago

I've been playing with the properties score multiplier that @kwalcock mentioned and only very small values seem to have an effect. Setting it to 0.1f or 0.05f only 1 of the 29 tests passes. With 0.01f, 9 tests pass. Setting it to 0.0f, 20 of the tests pass, and adding also a 0.0f multiplier to processes, 27 out the 29 tests pass, however, this seems like switching off the composition.

In case you find it useful, I've pushed a branch that only runs the 29 failing tests: composition-failing-tests

MihaiSurdeanu commented 2 years ago

@kwalcock , @zupon , @EgoLaparra : thanks for working on this during the weekend! I missed some of the discussion here due to travel today. Questions:

kwalcock commented 2 years ago

AFAICT, property groundings can use the embeddings, although there is an extra high SRLCompositionalGrounder.propertyConfidenceThreshold in maybeProperty.

Can someone eventually define predicate in this context, because it is confusing me?

MihaiSurdeanu commented 2 years ago

Predicate is a verb or a nominalized verb. I proposed a meeting today in a separate email to wrap this up. Can @kwalcock and @EgoLaparra please respond to that email? Thanks!

kwalcock commented 2 years ago

In "The price of oil decreased water transportation.", price and transportation are called validPredicates in the code. I guess price and transport can be verbs. They wouldn't be the predicate of the sentence but the predicate of something else, the mention potentially.

kwalcock commented 2 years ago

Should it be that

noisyOr(1.8379604816436768f, 0.874017596244812f * 0.5f) > noisyOr(1.8379604816436768f, 1.0f * 0.5f)

Is this noisy exclusive or? The values on the right are the same or more than the values on the left.

kwalcock commented 2 years ago

These are from "price of oil" and it is comparing noisyOr(oil, of) to noisyOr(oil, price) and "of oil" is winning. Can it be that the large value 1.8379604816436768f messes up this function?

  def noisyOr(values: Seq[Float], scale: Float = 1.0f): Float = {
    val epsilon = 0.01f
    val result = 1.0f - values.fold(1.0f)((product, value) => product * (1.0f - (value - epsilon)))
    result * scale
  }
EgoLaparra commented 2 years ago

It seems that it makes the result of the fold negative, so the output of the nosiyOr is the opposite of what it should be, right?

kwalcock commented 2 years ago

In "price of oil", both "price" and "of" qualify as potential properties based on grounding scores. "Of" probably shouldn't be tested at all. Right now there is only a check for canonical words, and "of" qualifies. What is the best way to rule it out? Stopword list, part of speech, lack of edges? isArg is defined as having incoming edges and no outgoing edges. isPred was defined as anything else, perhaps just for simplicity of implementation, but can those three other cases be categorized more closely and is one combination characteristic of stopwords?

kwalcock commented 2 years ago

The results of noisyOr are

1.4744141101837158 > 1.42225980758667

even though all inputs on the right are equal to or greater than inputs on the left. This gives us an incorrect answer.

kwalcock commented 2 years ago

However,

noisyOr(0.9, 0.874017596244812f * 0.5f) = 0.93697095 < 0.9439 = noisyOr(0.9, 1.0f * 0.5f)

which seems like the right way around. In letting some intermediate values creep above 1, it looks like we're getting unexpected behavior. Is there a more robust implementation?

EgoLaparra commented 2 years ago

I was not able to perturb the grounding of Increasing to wm/property/price_or_cost by removing just one example from the ontology. I had to remove 6 examples to force it to ground it to a different node: wm/property/coping_capacities.

As expected, removing so many examples has a big impact and only 604 tests pass (747 passed before the change).

Some details: The most similar nodes to Increasing before the change were:

wm/property/price_or_cost 0.6741894
wm/property/coping_capacities 0.6254506

The similarity of the wm/property/price_or_cost examples with Increasing are:

1 - higher costs 0.73212624
2 - high costs 0.6722427
3 - social costs 0.648605
4 - operating costs 0.605973
5 - capital costs 0.5994495
6 - opportunity costs 0.5955624
7 - recurrent costs 0.5875349
8 - indirect costs 0.58243614
9 - additional costs 0.57253253
10- costs 0.57253253
11- input costs 0.56632775
12- cost 0.507115
13- hidden costs 0.5036369
14- expenses 0.4259854
15- price 0.3589471

After removing the 6 most similar examples (i.e. higher costs, high costs, social costs, operating costs, capital costs and opportunity costs):

wm/property/coping_capacities 0.6254506
wm/property/price_or_cost, 0.61850816

I was not able to get the same outcome with any combination of less than 6 examples removed.

EgoLaparra commented 2 years ago

@kwalcock as far as I know, noisyOr works with probabilities, so all the input scores should be normalized between 0 and 1.

kwalcock commented 2 years ago

I'd like to do that, but there is code like

        val embeddingGroundings = // This is a Seq rather than a Map.
            for ((namer, embeddingScore) <- matchedEmbeddings)
            yield {
              val exampleScore = matchedExamples(namer)
              // val comboScore = embeddingScore
              // val comboScore = embeddingScore + (1 / (exampleScore + 1)) // Becky's simple version
              val comboScore = embeddingScore + 1 / (log(exampleScore + 1) + 1)
              // val comboScore = pow(embeddingScore.toDouble, exampleScore.toDouble)
              OntologyNodeGrounding(namer, comboScore.toFloat)
            }
kwalcock commented 2 years ago

I pushed recent changes. Usually 11 and sometimes 15 tests that were previously passing fail since after composition was added. Now it's 13. One hopes to make up for it by passing some of the tests that would fail in master if they weren't disabled. Enabling them, I get 744 passing and 160 failing. That's down from the 771 previously achieved. Can it be said that we overfit to data that was not really compositional and say that this is somehow better?

MihaiSurdeanu commented 2 years ago

Thanks @kwalcock ! Before we declare success, would it be possible to: a) Send the tests that are failing with the predicted groundings and the expected ground truth? b) Point me to the code you changed?

Thanks!

kwalcock commented 2 years ago

There is a PR (#1116) now (which won't pass unless I disable some tests) that supports a review. The most important changes are here and it's probably not implemented exactly as you instructed. It may be worth going over it line by line.

https://github.com/clulab/eidos/blob/8c22059098042777f6b6934b04c7d4316bc3f835/src/main/scala/org/clulab/wm/eidos/groundings/grounders/SRLCompositionalGrounder.scala#L304-L363

kwalcock commented 2 years ago

Here are a couple with more on the way:

val text = "The impact of the drought has been exacerbated by high local cereal prices , excess livestock mortality , conflict and restricted humanitarian access in some areas ."
val effectGroundings = Seq("wm/concept/crisis_or_disaster/environmental/drought", "", "", "")
canonicalName: drought

Expected :"[]"
Actual   :"[wm/property/preparedness]"
// I will check this one.  Does it go on searching after finding an exact match?
val text = "As of the first quarter of 2011 the Djiboutian government remains genuinely worried that a potential Afar insurgency in the north could quickly spread to the south , especially in view of the fact that the Djiboutian National Army is weak and the population in Djibouti City is facing deteriorating economic conditions due to high unemployment and inflation , which surged to 3,8 per cent in 2010 ."
val causeGroundings = Seq("wm/concept/economy/unemployment", "", "", "")
canonicalName: high unemployment

Expected :"[]"
Actual   :"[wm/property/quality]"
// May have gotten unemployment and then gone back for high
val text = "The brewing conflict had already had a serious impact in disrupting farming which had led to higher prices ."
val effectGroundings = Seq("", "wm/property/price_or_cost", "", "")
canonicalName: higher prices

Expected :"[wm/property/price_or_cost]"
Actual   :"[]"
val text = "The brewing conflict had already had a serious impact in disrupting farming which had led to higher prices ."
val effectGroundings = Seq("", "wm/property/price_or_cost", "", "")
canonicalName: higher prices

Expected :"[]"
Actual   :"[wm/process/consumption]"
// This seems to fill in the process rather than the property.  Perhaps price_or_cost doesn't make the cutoff.
kwalcock commented 2 years ago
val text = "The outlook for 2020 continues to be bleak as foreign exchange reserves shrink , budget deficits increase and unemployment rates rise steeply due to the economic impacts of the pandemic ."
val effectGroundings = Seq("wm/concept/economy/unemployment", "", "", "") //todo: add 'rate' property?
canonicalName: unemployment

Expected :"[]"
Actual   :"[wm/property/price_or_cost]"
val text = "The impact of research led productivity growth on poverty in Africa , Asia and Latin America ."
val causeGroundings = Seq("", "", "wm/process/research", "")
canonicalName: research

Expected :"[]"
Actual   :"[wm/property/preparedness]"

val effectGroundings = Seq("wm/concept/poverty", "", "", "") //fixme: bad effect span
canonicalName: productivity growth poverty Asia

Expected :"[]"
Actual   :"[wm/process/attempt]"
MihaiSurdeanu commented 2 years ago

@kwalcock: can you please explain the format of this output?

kwalcock commented 2 years ago

It doesn't help that sentenceHelper.validPredicates is returning duplicates, but it's easy to patch.

kwalcock commented 2 years ago

In

val text = "The impact of the drought has been exacerbated by high local cereal prices , excess livestock mortality , conflict and restricted humanitarian access in some areas ."
val effectGroundings = Seq("wm/concept/crisis_or_disaster/environmental/drought", "", "", "")
canonicalName: drought

Expected :"[]"
Actual   :"[wm/property/preparedness]"

When grounding the mention for "drought", an effect rather than cause, it must have gotten the first "wm/concept/crisis_or_disaster/environmental/drought" correct, but it went on to compose with "[wm/property/preparedness]" where the expected grounding had nothing more, "[]".

In this one the actual text being grounded is "impact of the drought" and impact probably results in wm/property/preparedness. I'm still checking.

MihaiSurdeanu commented 2 years ago

In a meeting now. Will look at these soon!