Open kwalcock opened 2 years ago
The sequence allows for the following sort to happen, no?
But the bigger question is in the title... Do we know where compositions are lost?
The sequence only contains compositional groundings that aren't really composed. I'm thinking of something like https://github.com/clulab/eidos/blob/621ca469f45c10ee015504c5ae6466b801f45a8c/src/main/scala/org/clulab/wm/eidos/groundings/grounders/SRLCompositionalGrounder.scala#L343-L347
Ok... But where are these combined with the concepts that serve as arguments to these predicates?
In order to collect the inexactPredicateGroundings there are variables like isArg and isPred. If there are predicates, this seems like the only place they would be composed.
Pseudo call stack:
groundEidosMention
groundSentenceSpan
groundSentenceSpan
if (validPredicates.isEmpty) groundWithoutPredicates(sentenceHelper)
else groundWithPredicates(sentenceHelper)
groundWithPredicates
findExactPredicateGroundingAndRange
findInexactPredicateGroundings
Ok, thanks!
Can you please try to find where the concepts get lost? For example, for the phrase "transportation of oil", where do we lose the grounding for "oil"?
The program is running, but IIRC oil is grounded to fuel and is not an exact match. Transportation is not an exact match, either. Not sure why. Inexact matches are not combined.
Ben's "transportation of water" works, almost. The scores don't sort well. The noisyOr is only applied to filled slots, so filling a second slot with a lower value is not as good as having the first single slot.
THEME PROCESS: wm/process/transportation/ (1.2696835) Total: 1.2596835
THEME: wm/concept/goods/water (1.0) THEME PROCESS: wm/process/transportation/ (1.2696835) Total: 1.0025969
THEME: wm/concept/goods/water (1.0) Total: 0.99
Sure. I think most matches in the real world will be inexact due to language sparsity. But we should assemble predicates and their arguments regardless of exact/inexact match.
Ben's "transportation of water" works, almost. The scores don't sort well. The noisyOr is only applied to filled slots, so filling a second slot with a lower value is not as good as having the first single slot.
THEME PROCESS: wm/process/transportation/ (1.2696835) Total: 1.2596835 THEME: wm/concept/goods/water (1.0) THEME PROCESS: wm/process/transportation/ (1.2696835) Total: 1.0025969 THEME: wm/concept/goods/water (1.0) Total: 0.99
This is pretty nice! Then what did Ben see? Is there a bug?
They (water and transportation) weren't combined before. They were separate compositional groundings in the Seq that got sorted.
I see. Awesome!
When there is a predicate, as is the case with both "(price) of oil" and "water (transportation)", the findExactPredicateGroundingAndRange does not necessarily find an exact match at all or may find an exact match which doesn't include the predicate, I should have called it be called findExactGroundingAndRangeIfThereIsAPredicateSomewhere. If it doesn't find an exact match as is the case with "price of oil", perhaps it should specifically target the predicate, make do with whatever it can find, and then go on to look for possible arguments, etc. In this particular case, it looks like the ontology doesn't help by having a node called price_or_cost. It would be better to pick one or the other and move the extra to an example. For "water transportation", water matches first, even though it isn't the predicate, but it is possible to do them in the other order, apparently.
I hope that @zupon is getting notifications in case he notices that I'm off track.
I have been getting these notifications, and it looks like you might be right on track! At one point, we did want to try to match the entire mention span even when there were predicates. That would get us stuff like "climate change", where it's sorta compositional but we also just have a climate_change
concept. Since we wanted the matching for things like this to be fuzzy, we then did the non-exact matching. This is useful when we have mentions that don't exactly match our ontology, but it sounds like it's now a problem whereby if we do get some non-exact match this way we just stop and ignore the rest, instead of continuing with the remaining content.
I can't look at this tonight, but I will try to take a look tomorrow. But I think this is looking in the right place.
Right now it's failing one NOT test by getting extra things, but it is probably at least composing. If I turn the failingTests to passingTests, it gets 747 passing while master gets 771. I had added some things like multiple exact matches. Maybe that hurt it. I don't have a feel for the data.
Thank you both! I think we should enable this change, which looks like it's improving things a lot.
Also, @EgoLaparra: would it be possible for you to look at the 771 - 747 = 24 tests that fail now and try to see if we can fix them? Thank you all!
I'm probably going to make a few more changes around https://github.com/clulab/eidos/blob/472520f1e41967036a715eb0ff6fdd7bcdcb207a/src/main/scala/org/clulab/wm/eidos/groundings/grounders/EidosOntologyGrounder.scala#L137-L145 within a couple of hours. Those lines did not survive the night.
@kwalcock let me know when you are done with those changes and I will take a look at the failing tests.
@EgoLaparra, I made the small change.
Thank you both!
On Sat, Feb 26, 2022 at 12:49 PM Keith Alcock @.***> wrote:
@EgoLaparra https://github.com/EgoLaparra, I made the small change.
— Reply to this email directly, view it on GitHub https://github.com/clulab/eidos/issues/1115#issuecomment-1052522550, or unsubscribe https://github.com/notifications/unsubscribe-auth/AAI75TQ33OUT4JNEXZ2MZ53U5EVEDANCNFSM5PK663SQ . Triage notifications on the go with GitHub Mobile for iOS https://apps.apple.com/app/apple-store/id1477376905?ct=notification-email&mt=8&pt=524675 or Android https://play.google.com/store/apps/details?id=com.github.android&referrer=utm_campaign%3Dnotification-email%26utm_medium%3Demail%26utm_source%3Dgithub.
You are receiving this because you commented.Message ID: @.***>
Almost all those tests seem to fail because composed groundings are scored higher than non-composed ones, even when the expected grounding is non-composed.
For example, for the cause mention high unemployment
the expected grounding is Seq("wm/concept/economy/unemployment", "", "", "")
but the predictions is Seq(wm/concept/economy/unemployment, wm/property/quality, , ,)
. The top 3 groundings produced by the algorithm are:
Seq(wm/concept/economy/unemployment, wm/property/quality, , , 0.99481976)
Seq(wm/concept/economy/unemployment, , , , 0.99)
Seq(, wm/property/quality, , , 0.4819764)
Another example: the prediction for Increasing tensions
is Seq(wm/concept/crisis_or_disaster/conflict/tension, wm/property/price_or_cost, ,)
while the expected grounding is Seq("wm/concept/crisis_or_disaster/conflict/tension", "", "", "")
. The top 3:
Seq(wm/concept/crisis_or_disaster/conflict/tension, wm/property/price_or_cost, , , 0.9948348)
Seq(wm/concept/crisis_or_disaster/conflict/tension, , , , 0.99)
Seq(, wm/property/price_or_cost, , , 0.48348033)
Is this behavior (i.e., composed scores > non-composed scores) expected?
There is in SRLCompositionalGrounder.scala a calculation of score in which the properties are multiplied by 0.5f before they are noisyOr'd. Changing that value might switch how these values sort.
I think it's worth tuning the score that Keith mentioned. Also, the first compositional example:
Seq(wm/concept/economy/unemployment, wm/property/quality, , ,)
seems correct to me.
However, the second one is clearly wrong...
Any other ideas on how to fix these? @zupon, you did look at situations such as the second one, right? In this case, I think "tensions" is consumed by the concept, and "increasing" is a magnitude adjective and it should not be used or grounding. Thus, there should be no tokens left for grounding the phantom property "price_or_cost"... You did keep track of which words were consumed during grounding, right @zupon ?
For both the high unemployment
and Increasing tensions
examples, are there predicates? I can sorta see the link from high
--> quality
, but I'm not seeing where Increasing
--> price
would make sense.
What should happen if there are predicates is (1) the algorithm looks first at the entire span (e.g. high unemployment
) to see if it matches a node or example exactly (for various meanings of "exactly"); (2) if it doesn't, we then take each predicate or argument individually and then ground it. When we match something with the fuzzy exact match, we do (or are supposed to, anyway) remove those tokens that matched from our subsequent grounding attempts. For example, if we had the span climate change regulations
, our exact match might return the climate_change
Concept. After the match, we remove those two tokens and are only left with regulations
to ground.
What may be happening here is the (fuzzy) exact matching for the span is accidentally allowing the entire Increasing tensions
span to check against the Property branch using w2v rather than only using regex like we do elsewhere.
@MihaiSurdeanu AFAIK we don't actually keep track of the tokens used for each slot in the predicate grounding in a way that we can refer back to them later, but the algorithm does (or should) remove tokens that are used for grounding, like with the exact match case for predicates/arguments.
There is this comment in SRLCompositionalGrounder:
// TODO (at some point) -- the empty sequence here is a placeholder for increase/decrease triggers
// Currently we don't have "access" to those here, but that could be changed
On the other hand, there are in the ontology words like "higher". One is in the ontology node higher_temperatures and there are others in the examples. To keep them in the text to be grounded might help in these cases.
Here's the node for price_or_cost, which has some hints of increase. It wouldn't have to be that way.
- node:
name: price_or_cost
patterns:
- \b([Cc]osts?|[Pp]rices?)\b
examples:
- additional costs
- capital costs
- cost
- costs
- expenses
- hidden costs
- high costs
- higher costs
- indirect costs
- input costs
- operating costs
- opportunity costs
- price
- recurrent costs
- social costs
I've been playing with the properties score multiplier that @kwalcock mentioned and only very small values seem to have an effect. Setting it to 0.1f
or 0.05f
only 1 of the 29 tests passes. With 0.01f
, 9 tests pass. Setting it to 0.0f
, 20 of the tests pass, and adding also a 0.0f
multiplier to processes, 27 out the 29 tests pass, however, this seems like switching off the composition.
In case you find it useful, I've pushed a branch that only runs the 29 failing tests: composition-failing-tests
@kwalcock , @zupon , @EgoLaparra : thanks for working on this during the weekend! I missed some of the discussion here due to travel today. Questions:
AFAICT, property groundings can use the embeddings, although there is an extra high SRLCompositionalGrounder.propertyConfidenceThreshold
in maybeProperty
.
Can someone eventually define predicate in this context, because it is confusing me?
Predicate is a verb or a nominalized verb. I proposed a meeting today in a separate email to wrap this up. Can @kwalcock and @EgoLaparra please respond to that email? Thanks!
In "The price of oil decreased water transportation.", price and transportation are called validPredicates in the code. I guess price and transport can be verbs. They wouldn't be the predicate of the sentence but the predicate of something else, the mention potentially.
Should it be that
noisyOr(1.8379604816436768f, 0.874017596244812f * 0.5f) > noisyOr(1.8379604816436768f, 1.0f * 0.5f)
Is this noisy exclusive or? The values on the right are the same or more than the values on the left.
These are from "price of oil" and it is comparing noisyOr(oil, of) to noisyOr(oil, price) and "of oil" is winning. Can it be that the large value 1.8379604816436768f messes up this function?
def noisyOr(values: Seq[Float], scale: Float = 1.0f): Float = {
val epsilon = 0.01f
val result = 1.0f - values.fold(1.0f)((product, value) => product * (1.0f - (value - epsilon)))
result * scale
}
It seems that it makes the result of the fold negative, so the output of the nosiyOr is the opposite of what it should be, right?
In "price of oil", both "price" and "of" qualify as potential properties based on grounding scores. "Of" probably shouldn't be tested at all. Right now there is only a check for canonical words, and "of" qualifies. What is the best way to rule it out? Stopword list, part of speech, lack of edges? isArg is defined as having incoming edges and no outgoing edges. isPred was defined as anything else, perhaps just for simplicity of implementation, but can those three other cases be categorized more closely and is one combination characteristic of stopwords?
The results of noisyOr are
1.4744141101837158 > 1.42225980758667
even though all inputs on the right are equal to or greater than inputs on the left. This gives us an incorrect answer.
However,
noisyOr(0.9, 0.874017596244812f * 0.5f) = 0.93697095 < 0.9439 = noisyOr(0.9, 1.0f * 0.5f)
which seems like the right way around. In letting some intermediate values creep above 1, it looks like we're getting unexpected behavior. Is there a more robust implementation?
I was not able to perturb the grounding of Increasing
to wm/property/price_or_cost
by removing just one example from the ontology. I had to remove 6 examples to force it to ground it to a different node: wm/property/coping_capacities
.
As expected, removing so many examples has a big impact and only 604 tests pass (747 passed before the change).
Some details:
The most similar nodes to Increasing
before the change were:
wm/property/price_or_cost 0.6741894
wm/property/coping_capacities 0.6254506
The similarity of the wm/property/price_or_cost
examples with Increasing
are:
1 - higher costs 0.73212624
2 - high costs 0.6722427
3 - social costs 0.648605
4 - operating costs 0.605973
5 - capital costs 0.5994495
6 - opportunity costs 0.5955624
7 - recurrent costs 0.5875349
8 - indirect costs 0.58243614
9 - additional costs 0.57253253
10- costs 0.57253253
11- input costs 0.56632775
12- cost 0.507115
13- hidden costs 0.5036369
14- expenses 0.4259854
15- price 0.3589471
After removing the 6 most similar examples (i.e. higher costs
, high costs
, social costs
, operating costs
, capital costs
and opportunity costs
):
wm/property/coping_capacities 0.6254506
wm/property/price_or_cost, 0.61850816
I was not able to get the same outcome with any combination of less than 6 examples removed.
@kwalcock as far as I know, noisyOr works with probabilities, so all the input scores should be normalized between 0 and 1.
I'd like to do that, but there is code like
val embeddingGroundings = // This is a Seq rather than a Map.
for ((namer, embeddingScore) <- matchedEmbeddings)
yield {
val exampleScore = matchedExamples(namer)
// val comboScore = embeddingScore
// val comboScore = embeddingScore + (1 / (exampleScore + 1)) // Becky's simple version
val comboScore = embeddingScore + 1 / (log(exampleScore + 1) + 1)
// val comboScore = pow(embeddingScore.toDouble, exampleScore.toDouble)
OntologyNodeGrounding(namer, comboScore.toFloat)
}
I pushed recent changes. Usually 11 and sometimes 15 tests that were previously passing fail since after composition was added. Now it's 13. One hopes to make up for it by passing some of the tests that would fail in master if they weren't disabled. Enabling them, I get 744 passing and 160 failing. That's down from the 771 previously achieved. Can it be said that we overfit to data that was not really compositional and say that this is somehow better?
Thanks @kwalcock ! Before we declare success, would it be possible to: a) Send the tests that are failing with the predicted groundings and the expected ground truth? b) Point me to the code you changed?
Thanks!
There is a PR (#1116) now (which won't pass unless I disable some tests) that supports a review. The most important changes are here and it's probably not implemented exactly as you instructed. It may be worth going over it line by line.
Here are a couple with more on the way:
val text = "The impact of the drought has been exacerbated by high local cereal prices , excess livestock mortality , conflict and restricted humanitarian access in some areas ."
val effectGroundings = Seq("wm/concept/crisis_or_disaster/environmental/drought", "", "", "")
canonicalName: drought
Expected :"[]"
Actual :"[wm/property/preparedness]"
// I will check this one. Does it go on searching after finding an exact match?
val text = "As of the first quarter of 2011 the Djiboutian government remains genuinely worried that a potential Afar insurgency in the north could quickly spread to the south , especially in view of the fact that the Djiboutian National Army is weak and the population in Djibouti City is facing deteriorating economic conditions due to high unemployment and inflation , which surged to 3,8 per cent in 2010 ."
val causeGroundings = Seq("wm/concept/economy/unemployment", "", "", "")
canonicalName: high unemployment
Expected :"[]"
Actual :"[wm/property/quality]"
// May have gotten unemployment and then gone back for high
val text = "The brewing conflict had already had a serious impact in disrupting farming which had led to higher prices ."
val effectGroundings = Seq("", "wm/property/price_or_cost", "", "")
canonicalName: higher prices
Expected :"[wm/property/price_or_cost]"
Actual :"[]"
val text = "The brewing conflict had already had a serious impact in disrupting farming which had led to higher prices ."
val effectGroundings = Seq("", "wm/property/price_or_cost", "", "")
canonicalName: higher prices
Expected :"[]"
Actual :"[wm/process/consumption]"
// This seems to fill in the process rather than the property. Perhaps price_or_cost doesn't make the cutoff.
val text = "The outlook for 2020 continues to be bleak as foreign exchange reserves shrink , budget deficits increase and unemployment rates rise steeply due to the economic impacts of the pandemic ."
val effectGroundings = Seq("wm/concept/economy/unemployment", "", "", "") //todo: add 'rate' property?
canonicalName: unemployment
Expected :"[]"
Actual :"[wm/property/price_or_cost]"
val text = "The impact of research led productivity growth on poverty in Africa , Asia and Latin America ."
val causeGroundings = Seq("", "", "wm/process/research", "")
canonicalName: research
Expected :"[]"
Actual :"[wm/property/preparedness]"
val effectGroundings = Seq("wm/concept/poverty", "", "", "") //fixme: bad effect span
canonicalName: productivity growth poverty Asia
Expected :"[]"
Actual :"[wm/process/attempt]"
@kwalcock: can you please explain the format of this output?
It doesn't help that sentenceHelper.validPredicates is returning duplicates, but it's easy to patch.
In
val text = "The impact of the drought has been exacerbated by high local cereal prices , excess livestock mortality , conflict and restricted humanitarian access in some areas ."
val effectGroundings = Seq("wm/concept/crisis_or_disaster/environmental/drought", "", "", "")
canonicalName: drought
Expected :"[]"
Actual :"[wm/property/preparedness]"
When grounding the mention for "drought", an effect rather than cause, it must have gotten the first "wm/concept/crisis_or_disaster/environmental/drought" correct, but it went on to compose with "[wm/property/preparedness]" where the expected grounding had nothing more, "[]".
In this one the actual text being grounded is "impact of the drought" and impact probably results in wm/property/preparedness. I'm still checking.
In a meeting now. Will look at these soon!
I wonder whether at
https://github.com/clulab/eidos/blob/8758be8a5860fc3404f1d6f66b5095a161a6d87c/src/main/scala/org/clulab/wm/eidos/groundings/grounders/SRLCompositionalGrounder.scala#L285
the exact and inexact groundings should not be combined in a sequence but rather into a combined predicate grounding.