clulab / processors

Natural Language Processors
https://clulab.github.io/processors/
417 stars 101 forks source link

See what balaur would look like when merged with master #775

Open kwalcock opened 6 months ago

kwalcock commented 6 months ago

@MihaiSurdeanu, this was published to artifactory as 9.0.0-RC3. I haven't tried it out yet, but is is available. It can go to maven if I merge into master, change a true to false, and re-release. There's no turning back then.

MihaiSurdeanu commented 6 months ago

Thanks @kwalcock ! Before pressing the button, can you please test it with habitus and sec? @enoriega: can you please test this release in SKEMA? Thank you both!

enoriega commented 6 months ago

@MihaiSurdeanu @kwalcock SKEMA doesn't work yet until pdf2txt is updated to use RC3:

[error]     * org.clulab:processors-main_2.12:9.0.0-RC3 (semver-spec) is selected over 9.0.0-RC2
[error]         +- org.clulab:skema_text_reading_2.12:0.1.0-SNAPSHOT  (depends on 9.0.0-RC3)
[error]         +- org.clulab:processors-corenlp_2.12:9.0.0-RC3       (depends on 9.0.0-RC3)
[error]         +- org.clulab:pdf2txt_2.12:1.2.0-RC2                  (depends on 9.0.0-RC2)
kwalcock commented 6 months ago

@enoriega, acknowledged. I'm adjusting eidos similarly because habitus uses both processors and it.

kwalcock commented 6 months ago

It looks like I probably compiled with Java 11 when the intention was to keep with Java 8. I will fix that. In Eidos I've removed Spanish and Portuguese and semantic role labelling and I need to remove separate calls to tagPartsOfSpeech, etc. 404 tests are passing and 223 failing. That's quite a lot.

kwalcock commented 6 months ago

All the compatibility experiments I did seem to have resulted in a mix of Java 8 and 11 class files being deployed in RC-3. This resulted in complaints when using Java 8. Because the jars were internally inconsistent, I deleted RC-3 from artifactory. RC-4 is being released now. You may have local copies of -RC3 that should be avoided. Please update build.sbt files to use -RC4. I'll continue to try it out with eidos and pdf2txt.

kwalcock commented 6 months ago

@enoriega, there is a 1.2.0-RC4 version of pdf2txt that is configured to use the 9.0.0-RC4 version of processors. They might be tested with reach.

kwalcock commented 6 months ago

Eidos has around 130 failing tests that I'll be looking into.

kwalcock commented 6 months ago

Habitus has been made to run (optionally) with the balaur release. The project doesn't have any unit tests expecting any particular grammatical prowess, but it has achieved compatibility. At the same time it uses eidos and pdf2txt, so lots of things had to line up. Eidos is where the grammar matters and I will still look at the problems there. SEC will come soon.

MihaiSurdeanu commented 6 months ago

Thank you @kwalcock !! This is a lot of work.

enoriega commented 6 months ago

@kwalcock I can resolve dependencies correctly in SKEMA after this change. I can't compile yet because the Odin API changed. Was it refactored into a separate namespace?

[error] /Users/enoriega/github/skema/skema/text_reading/scala/src/main/scala/org/ml4ai/skema/text_reading/serializer/SkemaJSONSerializer.scala:270:18: value id is not a member of org.clulab.odin.Mention
[error]       } yield (m.id, edgeUJson)
[error]                  ^
[error] /Users/enoriega/github/skema/skema/text_reading/scala/src/main/scala/org/ml4ai/skema/text_reading/serializer/SkemaJSONSerializer.scala:351:36: org.clulab.odin.serialization.json.TextBoundMentionOps.type does not take parameters
[error]         "id" -> TextBoundMentionOps(tb).id,
[error]                                    ^
[error] /Users/enoriega/github/skema/skema/text_reading/scala/src/main/scala/org/ml4ai/skema/text_reading/serializer/SkemaJSONSerializer.scala:374:35: org.clulab.odin.serialization.json.RelationMentionOps.type does not take parameters
[error]         "id" -> RelationMentionOps(rm).id,
[error]                                   ^
[error] /Users/enoriega/github/skema/skema/text_reading/scala/src/main/scala/org/ml4ai/skema/text_reading/serializer/SkemaJSONSerializer.scala:396:32: org.clulab.odin.serialization.json.EventMentionOps.type does not take parameters
[error]         "id" -> EventMentionOps(em).id,
[error]                                ^
[error] /Users/enoriega/github/skema/skema/text_reading/scala/src/main/scala/org/ml4ai/skema/text_reading/serializer/SkemaJSONSerializer.scala:419:24: value equivalenceHash is not a member of org.clulab.odin.Mention
[error]         hs = mns.map(_.equivalenceHash)
[error]                        ^
[error] /Users/enoriega/github/skema/skema/text_reading/scala/src/main/scala/org/ml4ai/skema/text_reading/serializer/SkemaJSONSerializer.scala:420:19: type mismatch;
[error]  found   : Any
[error]  required: Int
[error]       } yield mix(bh, unorderedHash(hs))
[error]                   ^
[error] /Users/enoriega/github/skema/skema/text_reading/scala/src/main/scala/org/ml4ai/skema/text_reading/serializer/SkemaJSONSerializer.scala:420:37: type mismatch;
[error]  found   : Any
[error]  required: TraversableOnce[Any]
[error]       } yield mix(bh, unorderedHash(hs))
[error]                                     ^
[error] /Users/enoriega/github/skema/skema/text_reading/scala/src/main/scala/org/ml4ai/skema/text_reading/serializer/SkemaJSONSerializer.scala:447:43: org.clulab.odin.serialization.json.TextBoundMentionOps.type does not take parameters
[error]       val h8 = mix(h7, TextBoundMentionOps(cm.trigger).equivalenceHash)
[error]                                           ^
[error] 8 errors found
[error] (Compile / compileIncremental) Compilation failed
[error] Total time: 19 s, completed Jan 19, 2024 3:14:43 PM
kwalcock commented 6 months ago

@enoriega, these probably have to do with changes in the serialization that are actually already in 8.5.4. Could you push what you have so that I can patch up the remaining ones? I don't remember offhand the updated syntax.

enoriega commented 6 months ago

@kwalcock I haven't changed this code in a while so this should be good to use. Thanks!

enoriega commented 6 months ago

@MihaiSurdeanu @kwalcock Does this branch evict all of the dynet dependencies? If so, we should consider updating reach to be able to run it easily on apple silicon.

A fresh clone of reach in the master branch fails on my M2 machine when trying to load fatdynet with the following error:

[error] [launcher] error during sbt launcher: java.lang.UnsatisfiedLinkError: /Users/enoriega/Library/Caches/JNA/temp/jna5178281256539034113.tmp: dlopen(/Users/enoriega/Library/Caches/JNA/temp/jna5178281256539034113.tmp, 0x0001): tried: '/Users/enoriega/Library/Caches/JNA/temp/jna5178281256539034113.tmp' (fat file, but missing compatible architecture (have 'i386,x86_64', need 'arm64')), '/System/Volumes/Preboot/Cryptexes/OS/Users/enoriega/Library/Caches/JNA/temp/jna5178281256539034113.tmp' (no such file), '/Users/enoriega/Library/Caches/JNA/temp/jna5178281256539034113.tmp' (fat file, but missing compatible architecture (have 'i386,x86_64', need 'arm64'))
kwalcock commented 6 months ago

@enoriega, I get the same error, only I'm pretty sure that it comes from sbt. It appears when I start sbt, long before anything dynet-related happens. It seems that the version 1.4.0 sbt that reach specifies is not compatible with M1/2. If I change project/build.properties to use 1.7.2 like processors, then sbt will start and the tests run, all except something that is not compiling.

enoriega commented 6 months ago

Thanks. I will update like that and figure out the rest @kwalcock

kwalcock commented 6 months ago

@enoriega, and yes, processors version 9+ will probably not have dynet. Some parses will be different, though, so we're checking for unexpected consequences.