clulab / eidos

Machine reading system for World Modelers
Apache License 2.0
36 stars 24 forks source link

Srl ner #1042

Closed BeckySharp closed 3 years ago

BeckySharp commented 3 years ago

@zupon I would like you to try this branch on the sentences from Robyn and see it it addresses the problems you saw.

@kwalcock somewhere about 20 if-else statements ago this code was too complex, I think I will try to refactor it dramatically, but IDK what the ETA on that will be.

kwalcock commented 3 years ago

I have some changes to SRLCompositionalGrounder.scala pending. I'll try to get at least the branch pushed soon so that you know what they are.

BeckySharp commented 3 years ago

awesome! that would be great! I'm in the process of rethinking the algorithm fundamentally

kwalcock commented 3 years ago

There were fewer than I thought that made it into a commit after all the debugging statements were removed, so merging might not be too difficult after all. It's branch kwalcock/reground. The ensureSRLs() changed because the JsonLD only records edges. If there are none, there's no record of the graph and the program tries to recalculate the empty graph unnecessarily. Elsewhere, the results are still occasionally inconsistent, so the validPredicates are being sorted. That seemed not to be enough and I have at least two files that failed a test. There is some sorting later involving the scores that might be insufficient because of ties, which do exist. If the topN is applied and it splits a tied group, then the order within the tie is important. I'm not sure that's the problem, though. Just FYI.

kwalcock commented 3 years ago

This may be because of the rename of the ontology file that I mentioned this morning. I will test on a different branch and update master if needed and then report back here.

[info] TestSentenceClassifier:
[info] org.clulab.wm.eidos.text.englishGrounding.TestGrounding *** ABORTED ***
[info]   java.lang.NullPointerException:
[info]   at java.util.Properties$LineReader.readLine(Properties.java:434)
[info]   at java.util.Properties.load0(Properties.java:353)
[info]   at java.util.Properties.load(Properties.java:341)
[info]   at org.clulab.wm.eidoscommon.utils.PropertiesBuilder$$anonfun$fromResource$1.apply(PropertiesBuilder.scala:57)
[info]   at org.clulab.wm.eidoscommon.utils.PropertiesBuilder$$anonfun$fromResource$1.apply(PropertiesBuilder.scala:56)
[info]   at org.clulab.wm.eidoscommon.utils.Closer$.liftedTree1$1(Closer.scala:20)
[info]   at org.clulab.wm.eidoscommon.utils.Closer$.autoClose(Closer.scala:19)
[info]   at org.clulab.wm.eidoscommon.utils.Closer$AutoCloser.autoClose(Closer.scala:60)
[info]   at org.clulab.wm.eidoscommon.utils.PropertiesBuilder$.fromResource(PropertiesBuilder.scala:56)
[info]   at org.clulab.wm.eidos.groundings.DomainHandler$$anonfun$1.apply(DomainHandler.scala:38)
[info]   ...
kwalcock commented 3 years ago

That does seem to be the cause. PR forthcoming.

kwalcock commented 3 years ago

Tests are passing. @zupon, if the error above is what you're getting, it may help to merge with the latest update to master as was done here.

BeckySharp commented 3 years ago

@kwalcock , wonderful beautiful @kwalcock -- first, please look at my refactor of the SRL compositional grounding code. The diff will be very hard to look at, so if you can look at them side by side (old and new) that would be ideal to marvel at how pretty the new code is!!! I mean, it -- it's soooo pretty. Ok, prettiness aside (but boy, is it pretty!), do we have tests that will check that I didn't change the behavior toooo much (a little was on purpose, and some edges cases I may not care about bc some things I was uncertain about how to handle in the first place)...?

ps -- @MihaiSurdeanu idk yet if it works, but this code is soooo pretty now! (the parts I refactored, look especially at lines 178-275 that removed like 35 nested if-else pairs in a recursive function) :)

OH! and @marcovzla helped me with the graph traversal part so it's likely even right! :D

kwalcock commented 3 years ago

It worked great for 25 files, but then stack overflowed because of infinite recursion. I'll try to track down the culprit. A key line in the loop was

    val children: Array[GraphNode] = getThemes(idx, s, backoff).map(GraphNode(_, s, backoff, topN, threshold))

in SRLCompositionalGrounder.scala line 357.

kwalcock commented 3 years ago

Here's one problematic file. I was using ReExtractCdrMetaFromDirectory to process it. Remove .txt from the end.

005e214e0bde30d3ba7e3ab0f7ee1b7a.json.txt

BeckySharp commented 3 years ago

@kwalcock noted, i'll run it on that. Also .. the failing test is for the serialization -- here is a diff of the jsonlds, i am at a loss to understand how they could be diff... the SRLs and roots and all are serialized now right? maybe I need a sort somewhere? image

kwalcock commented 3 years ago

I saw failures in TestGrounding, TestGrounderStability, TestSRLGrounder. I'll look at the diff One resulted in

[info] TestSRLGrounder:
[info] SRLCompositionalGrounder
[info] org.clulab.wm.eidos.text.englishGrounding.TestSRLGrounder *** ABORTED ***
[info]   java.lang.StackOverflowError:
[info]   at org.clulab.wm.eidos.groundings.grounders.SRLCompositionalGrounder$$anonfun$getArguments$2.apply(SRLCompositionalGrounder.scala:293)
[info]   at org.clulab.wm.eidos.groundings.grounders.SRLCompositionalGrounder$$anonfun$getArguments$2.apply(SRLCompositionalGrounder.scala:293)
[info]   at scala.collection.TraversableLike$$anonfun$filterImpl$1.apply(TraversableLike.scala:248)
[info]   at scala.collection.IndexedSeqOptimized$class.foreach(IndexedSeqOptimized.scala:33)
[info]   at scala.collection.mutable.ArrayOps$ofRef.foreach(ArrayOps.scala:186)
[info]   at scala.collection.TraversableLike$class.filterImpl(TraversableLike.scala:247)
[info]   at scala.collection.TraversableLike$class.filter(TraversableLike.scala:259)
[info]   at scala.collection.mutable.ArrayOps$ofRef.filter(ArrayOps.scala:186)
[info]   at org.clulab.wm.eidos.groundings.grounders.SRLCompositionalGrounder.getArguments(SRLCompositionalGrounder.scala:293)
[info]   at org.clulab.wm.eidos.groundings.grounders.SRLCompositionalGrounder.org$clulab$wm$eidos$groundings$grounders$SRLCompositionalGrounder$$getThemes(SRLCompositionalGrounder.scala:279)
BeckySharp commented 3 years ago

@kwalcock this has not been forgotten i promise

kwalcock commented 3 years ago

Please see #1047 for a merging from master with conflicts resolved.

BeckySharp commented 3 years ago

Please see #1047 for a merging from master with conflicts resolved.

thanks for the reminder!

BeckySharp commented 3 years ago

@kwalcock sorry, got caught in meetings but just pushed, I was thinking of making a test from the document you gave me -- thoughts?

kwalcock commented 3 years ago

It's probably a large document for a test. Did you happen to notice that the problem was with a particular sentence? I hadn't done that yet. Or do you think some kind of problematic structure should be generated to specifically target a sensitive operation? I'd be content to add a single sentence for now.

BeckySharp commented 3 years ago

It's not too big, but yay for tests, I found a bug :) fixing now

BeckySharp commented 3 years ago

OK, failing tests are: [error] org.clulab.wm.eidos.text.englishGrounding.TestGrounding [error] org.clulab.wm.eidos.document.TestDocumentAttachment [error] org.clulab.wm.eidos.document.TestSentenceClassifier

looking now

BeckySharp commented 3 years ago

@kwalcock i fixed the one test file (and corresponding tests) -- the srl roots are no longer what we want for our predicates (possibly they never were).

The other two seem to have something to do with the Document attachment for relevance (and the sentence classifier). I didn't adjust that part of the code, is it possible that some change in processors (this branch upped the procVersion) is causing a problem there?

fwiw -- we're really not using the sentence classifier ....

kwalcock commented 3 years ago

Some of the tests had just worked in master. It is the case that the ontology has been updated just recently. It could be that the new version has kicked in. It's possible to get the previous version by changing the line in build.sbt from "master-SNAPSHOT" to "2.1" or something like that (see https://jitpack.io/docs/). If that works, a separate step can be used to make the tests pass with "2.2". That's just a guess. You might have to compare files to see if it's really the case.

BeckySharp commented 3 years ago

but the sentence classifier isn’t related to grounding, right? i fixed the tests that broke bc of the new ontology version (resulted in improved tests really)

@kwalcock editing this message -- I went to look and all the current failing tests are (on the surface) not about grounding.

On Thursday, July 15, 2021, Keith Alcock @.***> wrote:

External Email

Some of the tests had just worked in master. It is the case that the ontology has been updated just recently. It could be that the new version has kicked in. It's possible to get the previous version by changing the line in build.sbt from "master-SNAPSHOT" to "2.1" or something like that (see https://jitpack.io/docs/). If that works, a separate step can be used to make the tests pass with "2.2". That's just a guess. You might have to compare files to see if it's really the case.

— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub https://github.com/clulab/eidos/pull/1042#issuecomment-881154759, or unsubscribe https://github.com/notifications/unsubscribe-auth/ABJCPCMFASK66WTNNIPEITDTX6TTNANCNFSM5ABJDOOA .

BeckySharp commented 3 years ago

I think i found the problem!! I think the sentence classifier was disabled because I wasn't using the flat grounding (I was only using compositional in the config). I am going to push a permanent fix

kwalcock commented 3 years ago

🥇

BeckySharp commented 3 years ago

@kwalcock please give this a whirl on a few k docs and hopefully it will (a) work, (b) not crash, (c) not be slower, and (d) be better groundings -- PLUS it's soooo pretty 🥰

kwalcock commented 3 years ago

Word on the street is that it is beautiful code. How can anyone know that the groundings are better?

BeckySharp commented 3 years ago

perhaps @zupon can run it on some documents he previously used and we can compare? thoughts @MihaiSurdeanu ?

MihaiSurdeanu commented 3 years ago

Agreed!

On Tue, Jul 20, 2021 at 09:13 Becky Sharp @.***> wrote:

perhaps @zupon https://github.com/zupon can run it on some documents he previously used and we can compare? thoughts @MihaiSurdeanu https://github.com/MihaiSurdeanu ?

— You are receiving this because you were mentioned.

Reply to this email directly, view it on GitHub https://github.com/clulab/eidos/pull/1042#issuecomment-883103001, or unsubscribe https://github.com/notifications/unsubscribe-auth/AAI75TXXNESY2NXZ4OT3OULTYUH2FANCNFSM5ABJDOOA .

kwalcock commented 3 years ago

So far there hasn't been a problem with any crash after more than 60K documents. One test measured it being slower at 61 seconds when the previous version got 34 seconds, but it's close enough. Whether it's working or better, I don't know.

MihaiSurdeanu commented 3 years ago

Nice!

On Wed, Jul 21, 2021 at 18:23 Keith Alcock @.***> wrote:

So far there hasn't been a problem with any crash after more than 60K documents. One test measured it being slower at 61 seconds when the previous version got 34 seconds, but it's close enough. Whether it's working or better, I don't know.

— You are receiving this because you were mentioned.

Reply to this email directly, view it on GitHub https://github.com/clulab/eidos/pull/1042#issuecomment-884277629, or unsubscribe https://github.com/notifications/unsubscribe-auth/AAI75TVOEU52G2U6O4ELUULTY3RA7ANCNFSM5ABJDOOA .

BeckySharp commented 3 years ago

So far there hasn't been a problem with any crash after more than 60K documents. One test measured it being slower at 61 seconds when the previous version got 34 seconds, but it's close enough. Whether it's working or better, I don't know.

Send me a jsonld or 2?

kwalcock commented 3 years ago

Not until after midnight. You're still on vacation.

kwalcock commented 3 years ago

jsonld files went out via email.

kwalcock commented 3 years ago

Perfect! @zupon ran into this earlier today.

kwalcock commented 3 years ago

Is it OK to merge this after the tests pass?

BeckySharp commented 3 years ago

Is it OK to merge this after the tests pass?

yes, thanks! if they fail I'll try to fix them asap

BeckySharp commented 3 years ago

Is it OK to merge this after the tests pass?

yes, thanks! if they fail I'll try to fix them asap

yep, some failed. on it.

BeckySharp commented 3 years ago

ok @kwalcock I'm befuddled. the TestSRLGrounder and TestGrounding failed, according to jenkins (our helpful butler). BUT when I look at the console log from jenkins, I can't find the failed tests. When I grep for a string I'd expect to see (e.g., "SRLCompositionalGrounder" from the behavior of... block) it's not there. Further, when I run them locally, they pass. Git says that I don't have anything in my configs that is diff... help?

kwalcock commented 3 years ago

Did you click on "Full Log" at the top of the console output? A change in the ontology, like the one committed today, could break tests. Sometimes that dependency is slow to update locally. Can you run that app ShowOntologyVersion? I would expect 872c52924b3a73668f3b2801e3a5fdfa6fbde951 for the most recent changes.

BeckySharp commented 3 years ago

Did you click on "Full Log" at the top of the console output? A change in the ontology, like the one committed today, could break tests. Sometimes that dependency is slow to update locally. Can you run that app ShowOntologyVersion? I would expect 872c52924b3a73668f3b2801e3a5fdfa6fbde951 for the most recent changes.

thanks -- looking now!

kwalcock commented 3 years ago

Thanks. Someday something should happen to those tests. Not today, though.