Comp grounding unit tests

zupon commented 3 years ago

This PR adds unit tests for compositional grounding. The tests check

That the grounding for each slot (theme, theme property, process, process property) is correct, based on the gold groundings we have right now
That the grounding for each slot is from the right branch
- theme -> wm/concept/
- theme property & process property -> wm/property/
- process -> wm/process/)
That the groundings we know are incorrect are NOT grounded to (using the current groundings from the evaluation sheet)

@ZhengTang1120 If you want, you can add your tests directly to this branch, or you can merge your branch into mine, or we can make a separate PR for yours. Whatever is easiest.

zupon commented 2 years ago

@kwalcock I tidied up the branch and now I think it could be ready to merge, pending your code review (and the various checks). Feel free to review at your leisure, make any changes you see fit, and also tag me for any changes I should make. Thanks!

Here are the passing/failing tests before (in master) and after (these changes):

Before:

683/904 passing (75.5%)
221/904 failing (24.5%)

After:

764/904 passing (84.5%)
140/904 failing (15.5%)

MihaiSurdeanu commented 2 years ago

Nice. Good job!

On Wed, Nov 10, 2021 at 2:12 PM zupon @.***> wrote:

@kwalcock https://github.com/kwalcock I tidied up the branch and now I think it could be ready to merge, pending your code review (and the various checks). Feel free to review at your leisure, make any changes you see fit, and also tag me for any changes I should make. Thanks!

Here are the passing/failing tests before (in master) and after (these changes):

Before:

683/904 passing (75.5%)

221/904 failing (24.5%)

After:

764/904 passing (84.5%)

140/904 failing (15.5%)

— You are receiving this because your review was requested. Reply to this email directly, view it on GitHub https://github.com/clulab/eidos/pull/1070#issuecomment-965609457, or unsubscribe https://github.com/notifications/unsubscribe-auth/AAI75TTGVVGPP7QJBUF55NDULKYYNANCNFSM5DEPBGJQ . Triage notifications on the go with GitHub Mobile for iOS https://apps.apple.com/app/apple-store/id1477376905?ct=notification-email&mt=8&pt=524675 or Android https://play.google.com/store/apps/details?id=com.github.android&referrer=utm_campaign%3Dnotification-email%26utm_medium%3Demail%26utm_source%3Dgithub.

kwalcock commented 2 years ago

The failing tests are these below and I'm checking into them. If they look familiar, please send a hint.

[error] Failed tests:
[error]     org.clulab.wm.eidos.text.englishGrounding.TestSRLGrounder
[error]     org.clulab.wm.eidos.serialization.jsonld.TestJLDDeserializer
[error]     org.clulab.wm.eidos.document.TestSentenceClassifier

[info] TestSentenceClassifier:
[info] SentenceClassifier on the small evaluation dataset
[info] - should have an accuracy above 0.69 and an f1 above 0.77 *** FAILED ***
[info]   false was not true (TestSentenceClassifier.scala:112)

[info] JLDDeserializer
[info] - should deserialize compositional groundings from jsonld *** FAILED ***
[info]   0 was not equal to 2 (TestJLDDeserializer.scala:701)

[info] TestSRLGrounder:
[info] SRLCompositionalGrounder
[info] - should not overflow the stack when there is a loop in the semantic roles *** FAILED ***
[info]   java.util.NoSuchElementException: head of empty list
[info]   at scala.collection.immutable.Nil$.head(List.scala:431)
[info]   at scala.collection.immutable.Nil$.head(List.scala:428)
[info]   at org.clulab.wm.eidos.groundings.grounders.SRLCompositionalGrounder.groundSentenceSpan(SRLCompositionalGrounder.scala:201)
[info]   at org.clulab.wm.eidos.groundings.grounders.SRLCompositionalGrounder.groundSentenceSpan(SRLCompositionalGrounder.scala:179)
[info]   at org.clulab.wm.eidos.groundings.grounders.SRLCompositionalGrounder.groundEidosMention(SRLCompositionalGrounder.scala:165)
[info]   at org.clulab.wm.eidos.groundings.OntologyHandler$$anonfun$3.apply(OntologyHandler.scala:41)
[info]   at org.clulab.wm.eidos.groundings.OntologyHandler$$anonfun$3.apply(OntologyHandler.scala:39)
[info]   at scala.collection.TraversableLike$$anonfun$flatMap$1.apply(TraversableLike.scala:241)
[info]   at scala.collection.TraversableLike$$anonfun$flatMap$1.apply(TraversableLike.scala:241)
[info]   at scala.collection.immutable.List.foreach(List.scala:392)
[info]   ...

zupon commented 2 years ago

I don't have any hints for those failing tests. The one I might be best suited to investigate is TestSRLGrounder to see what's going on with the semantic roles. I'll let you know if I find anything (no promises)!

zupon commented 2 years ago

I think I found a solution to the TestSRLGrounder issue. There was a test case where it had neither a Property grounding nor any Concept/Process groundings. The algorithm did not have a case for that, so it broke. I added some functionality to return an empty grounding in that situation. The TestSRLGrounder now passes, but I want to try a general sbt test before pushing it in case it breaks anything new.

kwalcock commented 2 years ago

Sounds like something I would never have found. Thanks!

zupon commented 2 years ago

It seems to run out of Java heap space before the tests finish running. I will just push what I have now, though, and hope for the best.

kwalcock commented 2 years ago

@zupon, that was the end of my first pass through the code. It looks like all the comments appear above. I'll pause and let you catch up.

kwalcock commented 2 years ago

BTW I've started to address most things, so no need for you to code a lot. There were plenty of uncertainties, though, which could use some feedback.

kwalcock commented 2 years ago

One problem with the JLDDeserializer was a hard-coded grounding that is no longer valid. That one is fixed.

kwalcock commented 2 years ago

For TestSentenceClassifier, the classification of one sentence has changed from a 0 to 1 so that the prediction no longer matches the label. It is the sentence

this loud, boisterous event, is put on by snow plow drivers, in order to have some fun and test their skills, by racing their plows through a rigorous obstacle course, trying to make a good time without murdering too many strategically placed mailboxes    0

I don't know if it is at all important. I can lower the expected accuracy from 0.69 to 0.68 if it isn't.

zupon commented 2 years ago

I pushed a new update that just removes that unimplemented function. After the checks finish (and hopefully they still pass) will this be ready to merge?

kwalcock commented 2 years ago

@zupon, I have quite a few in number, although minor in effect, changes lined up. I don't think you've seen some of the code review boxes above in green. There are some issues that need clarification so that I know which way to tweak. I should also check performance before it goes to master.

zupon commented 2 years ago

Ah yes. I did/do not see the code review boxes above. This has happened before, but I don't remember how it got solved. I'll snoop around and try some things and let you know if I still can't figure it out.

kwalcock commented 2 years ago

If I look at the page from a different browser without logging in, I don't see the review comments, either. I'll poke around as well.

kwalcock commented 2 years ago

Do I understand correctly that we're allowing things like this with nothing in the concept slot? It's just a property.

PredicateTuple(emptyOntologyGrounding, themePropertyOpt.get, emptyOntologyGrounding, emptyOntologyGrounding, Interval(index).toSet)

zupon commented 2 years ago

My understanding is yes. If we have something like "Drought increased prices.", we allow a grounding that just includes the themeProperty price_or_cost and nothing in the concept slot.

I believe this is different from how things used to be, where we needed to have a theme no matter what.

kwalcock commented 2 years ago

@zupon, are those other questions understandable? Don't need code, just clarification.

zupon commented 2 years ago

They are understandable! I added a quick comment to the other code review boxes that I hadn't addressed yet. The only question I don't really know the answer to is your question about what to do with the one sentence in TestSentenceClassifier.

kwalcock commented 2 years ago

The last change resulted in

[info] - should NOT process cause theme incorrectly *** FAILED ***
[info]   "wm/concept/crisis_or_disaster/conflict/discontent" was equal to "wm/concept/crisis_or_disaster/conflict/discontent" (TestGrounding.scala:50)

[info] - should process "For Belg rain dependent areas , the food security situation for farmers and agro-pastoralists will likely deteriorate as household food stocks start depleting , while the rain will improve the pasture and water conditions ." effect theme correctly *** FAILED ***
[info]   "wm/concept/[goods/water]" was not equal to "wm/concept/[environment/natural_resources/pasture]" (TestGrounding.scala:39)
[info]   Analysis:
[info]   "wm/concept/[goods/water]" -> "wm/concept/[environment/natural_resources/pasture]"

zupon commented 2 years ago

Thanks for catching that! I've updated the new failing tests to failingTest so the checks should (hopefully) pass now. New stats:

771/904 pass (85.3%)
133/904 fail (14.7%)

kwalcock commented 2 years ago

"Increasing tensions and violence have raised fears that a civil war or regional fragmentation could be looming" used to ground to wm/concept/crisis_or_disaster/conflict/tension. It no longer does because "tension" does not match "tensions" if only whole words match. This is unfortunate. The lemma might match.

"Increasing tensions and violence have raised fears that a civil war or regional fragmentation could be looming" no longer comes up empty. It matches wm/property/price_or_cost. This might be a side effect of the above mismatch. After finding no concept, it tries harder.

"Attempts at stabilizing prices are rarely completely successful because they need to be combined with safety nets and other social protection measures to mitigate the impact of higher food prices and to help prevent violent conflicts." no longer matches "wm/concept/safety_net" because "safety net" doesn't match on word boundaries. The lemma might match.

"Attempts at stabilizing prices are rarely completely successful because they need to be combined with safety nets and other social protection measures to mitigate the impact of higher food prices and to help prevent violent conflicts." now finds wm/property/right_to_use. That may be a side effect as well.

"The underlying push was that creating a window for the aggrieved to release some fume of anger would invariably reduce the concentration of conflict ." now gets the undesired wm/concept/crisis_or_disaster/conflict/discontent. I'm checking.

kwalcock commented 2 years ago

Changing to lemma does fix the first 4 of 5 problems.

zupon commented 2 years ago

Nice! Let me know if there are other areas I can provide feedback for (or just missed the first time around). I can also adjust the tests in TestGrounding if/when those are the ones that are failing.

MihaiSurdeanu commented 2 years ago

Thank you both. This is great.

kwalcock commented 2 years ago

That last test is passing now after I merged the test changes into my branch. It had to do with the empty/nonEmpty.

kwalcock commented 2 years ago

The timings look like this, with the annotation just there for control:

Branch	Annotation (ms)	Grounding (ms)
master	320657	15546
this one	316011	32722
pending	312266	13522

kwalcock commented 2 years ago

It looks to me like this is ready. I'm prepared to press the button if the branch owner thinks it's OK. It's certainly possible that the documentation effort, for example, exposes some rough spots to smooth out.

kwalcock commented 2 years ago

@zupon, is it OK to merge this?

zupon commented 2 years ago

This is good to go! Sorry for the delay. I was travelling since Friday and hadn't had a chance to get back to this.

clulab / eidos

Comp grounding unit tests #1070