clulab / eidos

Machine reading system for World Modelers
Apache License 2.0
36 stars 24 forks source link

Update standford-corenlp #974

Open kwalcock opened 3 years ago

kwalcock commented 3 years ago

We're thinking of changing from 3.9.2 to 4.2.0. The previous wanted lucene 4.10.3 and the latter wants lucene 7.5.0. timenorm would like 6.6.6 for the record.

Andrew,

Regarding

Keith: creates an Eidos branch that uses the processors branch that uses corenlp 4.2.0

Andrew: starts looking at the unit tests that fail

“nmod” and “nmod_*” become “obl” and “obl_*”

“dobj” becomes “obj”

A number of caveats will follow, but making branches and compiling them are fairly straightforward.  There is already a branch of processors called updateStanford that you would need to "git checkout updateStanford".  That needs to be published locally with something like "sbt publishLocal".  The results is processors 8.2.7-SNAPSHOT on your local drive in an ~/.ivy2/local directory..

Then there is an eidos branch also called updateStanfard.  In an eidos directory that is up to date, "git checkout updateStanford" should take care of that.  It is configured to use the snapshot version of processors.  It should build and try to run.

It seems like the major concern with the tags was that our odin rules keep up.  If those tags have been used in Actions and Finders, some code might need to change.  There is a TagSet class that was meant to handle some differences like that.

Caveats

It will not pass all the tests, but not only because of our stuff.  stanford-corenlp 3.9.2 depended on lucene 4.10.3.  Our geonorm wants 6.6.6 and that "evicted" the earlier version.  Some touching up of the assembly process took care of that.

stanford-corenlp 4.2.0 depends on lucene 7.5.0, or at least prefers to.  This overrides geonorm's preference of 6.6.6.  That may or may not be a problem directly, but the version change does result in a crash elsewhere that will prevent a full evaluation of our rules.

I don't remember the details of the geonorm dependency on lucene and maybe that's a deal breaker.

I think the crash is related to yet another version of lucene being included, under a different name, because some project had a license issue and there was a fork, etc.  There is some resource file in a top level directory that conflicts and might need to be patched.
kwalcock commented 3 years ago

The errors were addressed by adding to eidos

   // This matches the version of lucene that stanford-corenlp 4.2.0 uses.
    "org.apache.lucene"           % "lucene-backward-codecs"   % "7.5.0",

The current list of failing tests is

[info] Run completed in 28 minutes, 44 seconds.
[info] Total number of tests run: 596
[info] Suites: completed 78, aborted 0
[info] Tests: succeeded 366, failed 230, canceled 7, ignored 249, pending 0
[info] *** 230 TESTS FAILED ***
[error] Failed tests:
[error]         org.clulab.wm.eidos.text.english.raps.TestRaps
[error]         org.clulab.wm.eidos.text.english.eval6.TestDoc5
[error]         org.clulab.wm.eidos.serialization.jsonld.TestJLDSerializer
[error]         org.clulab.wm.eidos.text.english.raps.TestRaps1
[error]         org.clulab.wm.eidos.text.english.eval6.TestDoc8
[error]         org.clulab.wm.eidos.text.english.cag.TestCagP1
[error]         org.clulab.wm.eidos.text.english.cag.TestExtraText
[error]         org.clulab.wm.eidos.text.english.cag.TestCagP0
[error]         org.clulab.wm.eidos.text.english.eval6.TestDoc2
[error]         org.clulab.wm.eidos.text.english.cag.TestCagP4
[error]         org.clulab.wm.eidos.system.TestCrLf
[error]         org.clulab.wm.eidos.system.TestHedging
[error]         org.clulab.wm.eidos.text.english.eval6.TestDoc3
[error]         org.clulab.wm.eidos.text.english.eval6.TestDoc6
[error]         org.clulab.wm.eidos.text.english.cag.TestCagP3
[error]         org.clulab.wm.eidos.system.TestNegation
[error]         org.clulab.wm.eidos.rule.TestJointAdjectives
[error]         org.clulab.wm.eidos.text.english.eval6.TestDoc1
[error]         org.clulab.wm.eidos.text.english.eval6.TestDoc4
[error]         org.clulab.wm.eidos.text.english.cag.TestCagP6
[error]         org.clulab.wm.eidos.system.TestEidosMention
[error]         org.clulab.wm.eidos.text.english.cag.TestCagP2
[error]         org.clulab.wm.eidos.utils.TestMentionUtils
[error]         org.clulab.wm.eidos.system.TestEidosActions
[error]         org.clulab.wm.eidos.system.TestFiltering
[error]         org.clulab.wm.eidos.text.english.eval6.TestDoc7
[error] (Test / test) sbt.TestsFailedException: Tests unsuccessful

Some of these I may need to address, but for now I'm throwing it over the fence to @zupon. The processors and eidos branches needed are updateStanford.