Open kwalcock opened 2 years ago
The reason it doesn't stick with just drought is the noisyOr. Drought as an exact match has a local score of 1.0, but a noisyOr with epsilon = 0.01f of 0.99. When composed with other things, the 0.99 can be beaten. Impact is matching too well with preparedness. If is was some other word, it could be a better answer than simple drought.
THEME: wm/concept/crisis_or_disaster/environmental/drought (1.0) Theme properties: wm/property/preparedness (0.9787039) Total: 0.99479353
THEME: wm/concept/crisis_or_disaster/environmental/drought (1.0) Theme properties: wm/property/geographic_resolution/local (0.8740176) Total: 0.9942701
THEME: wm/concept/crisis_or_disaster/environmental/drought (1.0) Theme properties: wm/property/stability (0.9661405) Total: 0.9947307
val text = "The impact of the drought has been exacerbated by high local cereal prices , excess livestock mortality , conflict and restricted humanitarian access in some areas ." val effectGroundings = Seq("wm/concept/crisis_or_disaster/environmental/drought", "", "", "") canonicalName: drought Expected :"[]" Actual :"[wm/property/preparedness]" // I will check this one. Does it go on searching after finding an exact match?
I think the grounding above is not too bad, assuming the mention is "impact of the drought", especially considering your comments about NoisyOr.
val text = "As of the first quarter of 2011 the Djiboutian government remains genuinely worried that a potential Afar insurgency in the north could quickly spread to the south , especially in view of the fact that the Djiboutian National Army is weak and the population in Djibouti City is facing deteriorating economic conditions due to high unemployment and inflation , which surged to 3,8 per cent in 2010 ." val causeGroundings = Seq("wm/concept/economy/unemployment", "", "", "") canonicalName: high unemployment Expected :"[]" Actual :"[wm/property/quality]" // May have gotten unemployment and then gone back for high
The above I think is actually better than the ground truth!
val text = "The brewing conflict had already had a serious impact in disrupting farming which had led to higher prices ." val effectGroundings = Seq("", "wm/property/price_or_cost", "", "") canonicalName: higher prices Expected :"[wm/property/price_or_cost]" Actual :"[]"
I don't understand the output above. what is the actual output for the mention "higher prices"?
val text = "The brewing conflict had already had a serious impact in disrupting farming which had led to higher prices ." val effectGroundings = Seq("", "wm/property/price_or_cost", "", "") canonicalName: higher prices Expected :"[]" Actual :"[wm/process/consumption]" // This seems to fill in the process rather than the property. Perhaps price_or_cost doesn't make the cutoff.
@kwalcock: please see my question above about some of the outputs.
Also, looking at the new code for findInexactPredicateGroundings
, I don't see any grounding outputs that combine the 3 tuple elements, i.e., a property, a concept, and a process. This would be needed for something like "the cost of the transportation of oil". It looks to me like all the PredicateTuple
s generated have at most two elements populated. Did I miss something?
It might take ~1 hour to investigate each of these cases. You are correct about there only being two slots filled in. Mostly I forgot, but if there is a potential property and one position to the right an argument, two to the right a predicate, one to the left a predicate, two to the left an argument and of these two are potential properties themselves, what should the property be connected to and how many of the others should be included? It will help if isPred = !isArg is changed to !isArg && !stopword, because otherwise there are almost always both arguments and predicates.
I agree about !isArg && !stopword
.
The attachment of properties is tricky wo/ syntactic info, but I propose a similar heuristic as what you currently use:
In the unit tests there are two cases which call for three slots and one case where the property is attached to a vacant concept:
val text = "* Late onset of rains and long midseason dry spells led to localized household food production shortfalls ."
val effectGroundings = Seq("wm/concept/goods/food", "", "wm/process/production", "wm/property/unavailability")
val text = "Despite the large impact of the FFD program on growth in food consumption , results show that receipt of free food distribution causes a significant increase in perceived famine risk ."
val effectGroundings = Seq("wm/concept/crisis_or_disaster/famine", "", "wm/process/perceive", "wm/property/risk") //todo: add 'perceive' as a process? 'perception' exists now as a concept.
val text = "Attempts at stabilizing prices are rarely completely successful because they need to be combined with safety nets and other social protection measures to mitigate the impact of higher food prices and to help prevent violent conflicts ."
val effectGroundings = Seq("", "wm/property/price_or_cost", "wm/process/stabilization", "")
These can't be produced (yet).
The code for exact matches can presently only ever fill in two slots as well :-(
I can change the inexact matches to use the closest concept or process rather than the first one of the mention. Either can be a potential property, so it is unclear what do to with things that are both. I just picked the easiest option.
The stopwords are coming from https://github.com/clulab/eidos/blob/master/src/main/resources/org/clulab/wm/eidos/english/filtering/stops.txt and https://github.com/clulab/eidos/blob/master/src/main/resources/org/clulab/wm/eidos/english/filtering/transparent.txt and aren't good words to rule out. I'll try POS:
Tag != IN*
!= DT*
!= PRP*
But also add . and , and probably lots of other things. It's expensive to unnecessarily ground against things.
Now only 11 instead of 13 originally passing tests fail, which is good. Overall, though, 186 fail instead of the low of 141. 718 pass, down from 774. I will spot check for stupid mistakes, but otherwise it will probably have to be left at close to this.
It is implemented at:
This seems in retrospect to be ambiguous:
4. Check if properties in 2 attach to them (for each property look first to the right, then to left if none found; pick the first process/concept)
By look to the right I understood to look anywhere to the right all the way until the end of the mention, and to the left to look until the beginning of the mention. I've got a mention of 17 words here and there are some very long distance relationships. I wonder whether it should be one to the right, one to the left, two to the right, two to the left, etc. to find the closest one but with the right still at an advantage. There's probably not time to experiment with that.
To the end doesn't really make sense, because that would always include the first one (in the sentence) which would always be found and not need to be a default. On the other hand, a different one in the sentence might be closer so that the first one is wrong.
Taking the closest one, not going all the way to the end, performs worse, with 191 failed tests instead of 186. If it eventually looks like the properties are being over-generated, the search could be limited to the immediate vicinity and then be abandoned.
The code looks great @kwalcock ! I think we're very close to being done. Before that:
Thanks!
I'm working on it.
water transportation, which has an exact match on water:
{
"@type" : "Extraction",
"@id" : "_:Extraction_3",
"type" : "concept",
"subtype" : "entity",
"labels" : [ "Concept", "Entity" ],
"text" : "water transportation",
"rule" : "simple-np++Decrease_ported_syntax_1_verb",
"canonicalName" : "water transportation",
"groundings" : [ {
"@type" : "Groundings",
"name" : "wm_compositional",
"version" : "7df8e4a4237596bb40ceea9e72bb46821fc01827",
"versionDate" : "2021-10-19T19:46:27Z",
"values" : [ {
"@type" : "PredicateGrounding",
"theme" : [ {
"@type" : "Grounding",
"ontologyConcept" : "wm/concept/goods/water",
"value" : 1.0
} ],
"themeProcess" : [ {
"@type" : "Grounding",
"ontologyConcept" : "wm/process/transportation/",
"value" : 1.2696834802627563
}, {
"@type" : "Grounding",
"ontologyConcept" : "wm/process/transportation/commercial_transportation",
"value" : 1.1144535541534424
}, {
"@type" : "Grounding",
"ontologyConcept" : "wm/process/transportation/personal_transportation",
"value" : 1.1033762693405151
} ],
"value" : 1.0025968551635742,
"display" : "THEME: wm/concept/goods/water (1.0); THEME PROCESS: wm/process/transportation/ (1.2696835)"
}, {
"@type" : "PredicateGrounding",
"theme" : [ {
"@type" : "Grounding",
"ontologyConcept" : "wm/concept/goods/water",
"value" : 1.0
} ],
"value" : 0.9900000095367432,
"display" : "THEME: wm/concept/goods/water (1.0)"
} ]
}
transportation of oil, only inexact matches:
{
"@type" : "Extraction",
"@id" : "_:Extraction_3",
"type" : "concept",
"subtype" : "entity",
"labels" : [ "Concept", "Entity" ],
"text" : "transportation of oil",
"rule" : "simple-np++Decrease_ported_syntax_1_verb",
"canonicalName" : "transportation oil",
"groundings" : [ {
"@type" : "Groundings",
"name" : "wm_compositional",
"version" : "7df8e4a4237596bb40ceea9e72bb46821fc01827",
"versionDate" : "2021-10-19T19:46:27Z",
"values" : [ {
"@type" : "PredicateGrounding",
"theme" : [ {
"@type" : "Grounding",
"ontologyConcept" : "wm/concept/goods/fuel",
"value" : 1.8379604816436768
}, {
"@type" : "Grounding",
"ontologyConcept" : "wm/concept/environment/natural_resources/fossil_fuels",
"value" : 1.727161169052124
}, {
"@type" : "Grounding",
"ontologyConcept" : "wm/concept/goods/protein_foods",
"value" : 0.9827450513839722
} ],
"themeProcess" : [ {
"@type" : "Grounding",
"ontologyConcept" : "wm/process/transportation/",
"value" : 1.2696834802627563
}, {
"@type" : "Grounding",
"ontologyConcept" : "wm/process/transportation/commercial_transportation",
"value" : 1.1144535541534424
}, {
"@type" : "Grounding",
"ontologyConcept" : "wm/process/transportation/personal_transportation",
"value" : 1.1033762693405151
} ],
"value" : 0.7849923372268677,
"display" : "THEME: wm/concept/goods/fuel (1.8379605); THEME PROCESS: wm/process/transportation/ (1.2696835)"
} ]
}
cost of transportation of oil:
{
"@type" : "Extraction",
"@id" : "_:Extraction_3",
"type" : "concept",
"subtype" : "entity",
"labels" : [ "Concept", "Entity" ],
"text" : "cost of transportation of oil",
"rule" : "gazetteer++simple-np++Decrease_ported_syntax_1_verb",
"canonicalName" : "transportation oil",
"groundings" : [ {
"@type" : "Groundings",
"name" : "wm_compositional",
"version" : "7df8e4a4237596bb40ceea9e72bb46821fc01827",
"versionDate" : "2021-10-19T19:46:27Z",
"values" : [ {
"@type" : "PredicateGrounding",
"theme" : [ {
"@type" : "Grounding",
"ontologyConcept" : "wm/concept/goods/fuel",
"value" : 1.8379604816436768
}, {
"@type" : "Grounding",
"ontologyConcept" : "wm/concept/environment/natural_resources/fossil_fuels",
"value" : 1.727161169052124
}, {
"@type" : "Grounding",
"ontologyConcept" : "wm/concept/goods/protein_foods",
"value" : 0.9827450513839722
} ],
"themeProcess" : [ {
"@type" : "Grounding",
"ontologyConcept" : "wm/process/transportation/",
"value" : 1.2696834802627563
}, {
"@type" : "Grounding",
"ontologyConcept" : "wm/process/transportation/commercial_transportation",
"value" : 1.1144535541534424
}, {
"@type" : "Grounding",
"ontologyConcept" : "wm/process/transportation/personal_transportation",
"value" : 1.1033762693405151
} ],
"themeProcessProperties" : [ {
"@type" : "Grounding",
"ontologyConcept" : "wm/property/price_or_cost",
"value" : 1.0
} ],
"value" : 0.890346109867096,
"display" : "THEME: wm/concept/goods/fuel (1.8379605); THEME PROCESS: wm/process/transportation/ (1.2696835), Process properties: wm/property/price_or_cost (1.0)"
} ]
}
If I can read the JSON correctly, it looks to me like all these 3 examples are correct! Nice job @kwalcock !
canonicalName: drought
words: of the drought
[info] - should process "The impact of the drought has been exacerbated by high local cereal prices , excess livestock mortality , conflict and restricted humanitarian access in some areas ." effect themeProperty correctly *** FAILED ***
[info] "[wm/property/preparedness]" was not equal to "[]" (TestGrounding.scala:45)
[info] Analysis:
[info] "[wm/property/preparedness]" -> "[]"
It wanted
val effectGroundings = Seq("wm/concept/crisis_or_disaster/environmental/drought", "", "", "")
but apparently got
val effectGroundings = Seq("wm/concept/crisis_or_disaster/environmental/drought", "wm/property/preparedness", "", "")
Most of the errors are from a slot being filled in that wasn't before, particularly for properties and processes. Perhaps the tests paid less attention to those, or maybe the isPred = !isArg results in too many potential processes and maybe there are too many long-distance properties or the property match threshold is too low.
This is already a week overdue. I'm not sure there's time for analysis.
Examples:
[info] "[wm/property/stability]" was not equal to "[]" (TestGrounding.scala:47)
[info] "[wm/property/condition]" was not equal to "[]" (TestGrounding.scala:45)
[info] "[wm/property/preparedness]" was not equal to "[]" (TestGrounding.scala:45)
[info] "[wm/process/population/death]" was not equal to "[]" (TestGrounding.scala:46)
[info] "[wm/process/combining]" was not equal to "[]" (TestGrounding.scala:46)
[info] "[wm/property/violent]" was not equal to "[]" (TestGrounding.scala:47)
[info] "[wm/property/quality]" was not equal to "[]" (TestGrounding.scala:45)
[info] "[wm/property/aftermath]" was not equal to "[]" (TestGrounding.scala:47)
[info] "[wm/property/stability]" was not equal to "[]" (TestGrounding.scala:45)
[info] "[wm/property/stability]" was not equal to "[]" (TestGrounding.scala:45)
[info] "[wm/property/preparedness]" was not equal to "[]" (TestGrounding.scala:45)
[info] "[wm/property/price_or_cost]" was not equal to "[]" (TestGrounding.scala:45)
[info] "[wm/property/preparedness]" was not equal to "[]" (TestGrounding.scala:47)
etc.
Fewer are changes:
[info] "wm/process/[farming/herding]" was not equal to "wm/process/[population/migrate/]" (TestGrounding.scala:46)
[info] "wm/concept/[entity/muslim_communities]" was not equal to "wm/concept/[agriculture/]" (TestGrounding.scala:44)
[info] "wm/concept/[entity/hamas]" was not equal to "wm/concept/[plan/]" (TestGrounding.scala:44)
[info] "wm/concept/[entity/muslim_communities]" was not equal to "wm/concept/[government/]" (TestGrounding.scala:44)
[info] "...ocess/communication/[campaign]" was not equal to "...ocess/communication/[informing]" (TestGrounding.scala:46)
etc.
Agreed that we can't spend too much time on this at this point.
But the above example for "of the drought" worries me. Which tokens were used for the preparedness
grounding? It seems "drought" was used for drought
. The other two "of" and "the" are stop words. So, then what was used?
It is grounding to "impact of the drought". "Impact" doesn't get printed because it is special, probably transparent.
Then I think it is fine!
Should we merge this PR?
I would need to disable failing tests before it can go into master. Right now it is more important to go into kwalcock/wmexchanger which has all the updates for the new DART protocols. That hadn't gone to master yet because it was only just put into use. It is headed there for the code freeze.
Thanks! Let me know when it's done so we can email MITRE.
The documents were regrounded and uploaded. I slacked Ben and Robyn.
Beautiful!! Thanks!
@MihaiSurdeanu
Regarding this:
Do NOT use them if they were used as triggers for increase/decrease attributes, AND they are tagged as JJ* or VBN.
There is code intended to do very much the same at these places which use a set of words which to exclude:
The conditions are slightly different: the POS is not checked. It is also not used consequently throughout the code. It only rules out predicates right now. I wonder whether I should update the conditions or reuse them.
While I'm at it, the predicates are calculated in SentenceHelper and that definition might be reused. It is slightly different: https://github.com/clulab/eidos/blob/d9fa808f725f23af0b2037b39655ba1dc519a25d/src/main/scala/org/clulab/wm/eidos/groundings/grounders/srl/SentenceHelper.scala#L18-L34
Yep, I think the attachment strings should be excluded from grounding. Do these include increase/decrease words? What type are they, TriggeredAttachment?
TriggeredAttachment is a superclass of things like Decrease, Increase, Quantification, Property, Pos(itive)Change, NegChange, Heding, Negation. This is code that Becky wrote, so probably well thought-out. It could be narrowed down to some of the subclasses.
I'm not liking that the present code rules out words rather than words at positions. If a word is repeated in a sentence, removing based only text will take it out everywhere even if it is used differently.
I agree with everything you said. But having the same word repeated in the same phrase is pretty unlikely…
On Thu, Mar 10, 2022 at 20:21 Keith Alcock @.***> wrote:
TriggeredAttachment is a superclass of things like Decrease, Increase, Quantification, Property, Pos(itive)Change, NegChange, Heding, Negation. This is code that Becky wrote, so probably well thought-out. It could be narrowed down to some of the subclasses.
I'm not liking that the present code rules out words rather than words at positions. If a word is repeated in a sentence, removing based only text will take it out everywhere even if it is used differently.
— Reply to this email directly, view it on GitHub https://github.com/clulab/eidos/issues/1115#issuecomment-1064733688, or unsubscribe https://github.com/notifications/unsubscribe-auth/AAI75TQVC3VTJ7X4W4ZINT3U7K32LANCNFSM5PK663SQ . Triage notifications on the go with GitHub Mobile for iOS https://apps.apple.com/app/apple-store/id1477376905?ct=notification-email&mt=8&pt=524675 or Android https://play.google.com/store/apps/details?id=com.github.android&referrer=utm_campaign%3Dnotification-email%26utm_medium%3Demail%26utm_source%3Dgithub.
You are receiving this because you were mentioned.Message ID: @.***>
Exact matches are only run on the ontology node names. The ontology node examples have the approximate matches. We've been recently using lemmas to compare against the unlemmatized (raw) node names. The means that "higher temperatures" would be changed to "higher temperature" before comparison and not match the node wm/concept/environment/higher_temperatures. I don't think we can ask the ontology creators to use lemmas, but it is easy for us to test both words and lemmas and take the best. I have made that change.
Less easy is dealing with "higher". It is one of those words ruled out because it is an attribute or something and therefore would not be used in an exact match. I think that creators of the ontology might put specific texts in the node names that they want to match no matter what. I'm waiting to invalidate the attributes and parts of speech until after the exact match calculation has completed.
Another example seems to be armed conflict.
African governments works better in this version. It is a node. Also "cash crops". There are some times in which the match is a distraction, however. So, I've made it so that at least one exactly matching word needs to be valid. In the sentence
"exceptional prolonged and persistent agro-pastoral drought sequence Highlights 1"
it changed the match from prolonged to drought. wm/concept/time/prolonged is strange. It seems more like a property.
Thanks @kwalcock ! I agree with the changes you made. I like the idea of having >= 1 valid words in the exact match!
I wonder whether at
https://github.com/clulab/eidos/blob/8758be8a5860fc3404f1d6f66b5095a161a6d87c/src/main/scala/org/clulab/wm/eidos/groundings/grounders/SRLCompositionalGrounder.scala#L285
the exact and inexact groundings should not be combined in a sequence but rather into a combined predicate grounding.