Closed kwalcock closed 3 years ago
Find out when/why Arizona output was removed.
Find out what causes this error and whether it is responsible for any slowdown:
Scala.MatchError: KDtrigger(org.clulab.reach.mentions.BioTextBoundMention@5ebec77a) (of class org.clulab.reach.mentions.KDtrigger)
I believe the Arizona output was removed as part of #667 since it required certain parts of assembly.
But I believe @myedibleenso added it as part of the ASKE hackathon. I am trying to trace it down.
But I believe @myedibleenso added it as part of the ASKE hackathon
I can't speak to the hackathon you mentioned as I'm not involved in that project. I was asked to add assembly back to support work on causal ordering of events/relations. Perhaps that was around the same time?
There has been a PR open since August 26 that adds back assembly along with compatibility changes related to universal dependencies:
It looks like subsequent merges resulted in some conflicts.
That PR does not include the old ReachCLI
capabilities, as at that time no one was interested in having those back in the project.
In order to add them back, I would take the following approach:
Remove the final sieve (including the preceding andThen
):
That classifier has not been retrained and the feature extraction process is what made assembly expensive to run (see https://github.com/clulab/reach/issues/299).
If you're not interested in causal precedence, all but the first sieve in applySieves
can be disabled.
Thanks Gus!
This is my fault. I dropped the ball on this... @enoriega, I will solve this either today or tomorrow, and then hopefully you'll be able to generate the arizona format again.
Not your fault, @MihaiSurdeanu. There was likely a good reason for waiting to merge at the time this was opened, but we've since forgotten what it was. Please let me know if you run into issues.
@kwalcock: I implemented all the changes recommended by @myedibleenso in his branch: myedibleenso/assembly
However, compilation fails with this error:
[error] sbt.librarymanagement.ResolveException: Error downloading org.scalatest:scalatest_2.12:2.2.4
[error] Not found
[error] Not found
[error] not found: /Users/mihais/.ivy2/local/org.scalatest/scalatest_2.12/2.2.4/ivys/ivy.xml
[error] not found: https://repo1.maven.org/maven2/org/scalatest/scalatest_2.12/2.2.4/scalatest_2.12-2.2.4.pom
[error] not found: http://artifactory.cs.arizona.edu:8081/artifactory/sbt-release/org/scalatest/scalatest_2.12/2.2.4/scalatest_2.12-2.2.4.pom
[error] at lmcoursier.CoursierDependencyResolution.unresolvedWarningOrThrow(CoursierDependencyResolution.scala:258)
[error] at lmcoursier.CoursierDependencyResolution.$anonfun$update$38(CoursierDependencyResolution.scala:227)
[error] at scala.util.Either$LeftProjection.map(Either.scala:573)
[error] at lmcoursier.CoursierDependencyResolution.update(CoursierDependencyResolution.scala:227)
[error] at sbt.librarymanagement.DependencyResolution.update(DependencyResolution.scala:60)
[error] at sbt.internal.LibraryManagement$.resolve$1(LibraryManagement.scala:53)
[error] at sbt.internal.LibraryManagement$.$anonfun$cachedUpdate$12(LibraryManagement.scala:103)
[error] at sbt.util.Tracked$.$anonfun$lastOutput$1(Tracked.scala:73)
[error] at sbt.internal.LibraryManagement$.$anonfun$cachedUpdate$20(LibraryManagement.scala:116)
[error] at scala.util.control.Exception$Catch.apply(Exception.scala:228)
[error] at sbt.internal.LibraryManagement$.$anonfun$cachedUpdate$11(LibraryManagement.scala:116)
[error] at sbt.internal.LibraryManagement$.$anonfun$cachedUpdate$11$adapted(LibraryManagement.scala:97)
[error] at sbt.util.Tracked$.$anonfun$inputChangedW$1(Tracked.scala:219)
[error] at sbt.internal.LibraryManagement$.cachedUpdate(LibraryManagement.scala:130)
[error] at sbt.Classpaths$.$anonfun$updateTask0$5(Defaults.scala:3440)
[error] at scala.Function1.$anonfun$compose$1(Function1.scala:49)
[error] at sbt.internal.util.$tilde$greater.$anonfun$$u2219$1(TypeFunctions.scala:62)
[error] at sbt.std.Transform$$anon$4.work(Transform.scala:68)
[error] at sbt.Execute.$anonfun$submit$2(Execute.scala:282)
[error] at sbt.internal.util.ErrorHandling$.wideConvert(ErrorHandling.scala:23)
[error] at sbt.Execute.work(Execute.scala:291)
[error] at sbt.Execute.$anonfun$submit$1(Execute.scala:282)
[error] at sbt.ConcurrentRestrictions$$anon$4.$anonfun$submitValid$1(ConcurrentRestrictions.scala:265)
[error] at sbt.CompletionService$$anon$2.call(CompletionService.scala:64)
[error] at java.util.concurrent.FutureTask.run(FutureTask.java:266)
[error] at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
[error] at java.util.concurrent.FutureTask.run(FutureTask.java:266)
[error] at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
[error] at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
[error] at java.lang.Thread.run(Thread.java:748)
[error] (causalAssembly / update) sbt.librarymanagement.ResolveException: Error downloading org.scalatest:scalatest_2.12:2.2.4
[error] Not found
[error] Not found
[error] not found: /Users/mihais/.ivy2/local/org.scalatest/scalatest_2.12/2.2.4/ivys/ivy.xml
[error] not found: https://repo1.maven.org/maven2/org/scalatest/scalatest_2.12/2.2.4/scalatest_2.12-2.2.4.pom
[error] not found: http://artifactory.cs.arizona.edu:8081/artifactory/sbt-release/org/scalatest/scalatest_2.12/2.2.4/scalatest_2.12-2.2.4.pom
Can you please look into it?
Acknowledged
@MihaiSurdeanu, all the other mentions of scalatest were for version 3.0.1. After changing assembly/build.sbt to that version of scalatest, the library problem went away. Some other problems did appear, though. Perhaps they are related to what you were working on.
@myedibleenso: one of the compilation errors comes from the fact that the old writeJSON had an Assembler as parameter:
But (see L69) this method calls another writeJSON without the Assembler. Am I missing something, or is that Assembler not used in the writeJSON?
Very strange. According to git I am to blame for that, but I can see no reason behind the change. I think it's safe to remove, as it is unused. It must've been in anticipation of some output format. Sorry about that!
Thanks @myedibleenso ! Fixed that and added the missing Arizona and CMU outputters. I think we're almost there. But I ran into one more error:
equivalenceHash
takes a parameter, but here it is called without one. Do you know what value should be the default?
Thanks!
@enoriega, do you have the format you need now?
@kwalcock Not yet, but almost there!
This issue has been addressed successfully. Thank you everybody.
On December 26, 2020 at 12:31:17 PM, Enrique Noriega (enoriega@email.arizona.edu) wrote:
Hi,
About the Arizona output: Good question. It looks like it’s been disabled/removed from the current state of the code. The output formats I showed are the only ones that seem to be currently supported:
https://github.com/clulab/reach/blob/5777d66448f1ccb737e69bedae5f8f8073e562b7/src/main/scala/org/clulab/reach/ReachCLI.scala#L158-L189
Additionally, I did a string search on the repo and the only reference to the Arizona output format is in the web interface, last updated more than three years ago:
https://github.com/clulab/reach/blob/5777d66448f1ccb737e69bedae5f8f8073e562b7/export/src/main/resources/org/clulab/reach/export/server/static/fileprocessorwebui.html
I can roll back to an older version of REACH that still has the Arizona output enabled or reach out to Keith to see if he knows what happened to it, depending how worthy this output format is.
Regarding speed, it is indeed slow. It has been running for about one day and has finished processing 1.4k papers, by my counts it would take more than a month to finish if I don’t speed up/scale up. So, I’m disabling context and polarity to see how many more papers are finished by tomorrow. I also noticed there are a lot of error messages, but after glancing at the output log, the stack trace doesn’t point to dynet, it instead seems to be related to new trigger subtypes which aren’t covered on pattern matching statements:
Scala.MatchError: KDtrigger(org.clulab.reach.mentions.BioTextBoundMention@5ebec77a) (of class org.clulab.reach.mentions.KDtrigger)
I suspect those are easy to fix.
I’ll send you an update tomorrow.
On Dec 25, 2020, at 11:03 PM, Mihai Surdeanu surdeanu@gmail.com wrote:
External Email
Thanks Enrique!
Also, Felix Navidad!
On December 25, 2020 at 2:22:32 PM, Enrique Noriega (enoriega@email.arizona.edu) wrote:
Hi Mihai & Clay,
Merry Christmas!
Just wanted to give you a brief update in this:
I located 101,289 documents on 2019’s index and started running reach on them.
I enabled all the output formats, which are: Fries, text, serial-json and indexcard
Indexcard seems to be outdated according to the code, I can disable it if necessary. I also left polarity and context enabled. Idk how long it will take because I think that polarity might be a bit slow when the deep learning engine kicks in, so if polarity may not be relevant in the future, then I can disable it.
I am using Amy and enabled all cores (20) and 150 gigs of ram. If somebody might need it, let me know to scale down the amount of resources.
On Dec 22, 2020, at 3:15 PM, Mihai Surdeanu msurdeanu@email.arizona.edu wrote:
Great, thanks!
On Tue, Dec 22, 2020 at 11:55 Enrique Noriega enoriega@email.arizona.edu wrote: Thanks. I'll get back to you with the results soon
On Tue, Dec 22, 2020 at 12:34 PM Mihai Surdeanu msurdeanu@email.arizona.edu wrote: Hi Enrique,
I propose the following process for a first run:
Add a new query in NxmlSearcher here: https://github.com/clulab/reach/blob/master/main/src/main/scala/org/clulab/reach/indexer/NxmlSearcher.scala#L707 The new query should search for "Interleukin 6" OR "IL-6" (not sure if there are any other variants for this protein)
Run NxmlSearcher on the Dec 2019 index here: /data/nlp/corpora/pmc_openaccess/pmc_dec2019_index Please let us know how many docs this query finds.
Run Reach on the docs from step 2, and save all outputs supported.
We'll then share this output with our collaborators.
Thank you! Mihai
On Tue, Dec 15, 2020 at 12:27 PM Mihai Surdeanu msurdeanu@email.arizona.edu wrote: Hi Enrique and Clayton, Can we meet soon(ish) to talk about this frailty use case?
==================================== positives: Interleukin 6, IL-6 cytomegalovirus HCMV CMV
processes: inflammation
negative ones: IL-1beta CRP
NOTES: Key points: Interleukin1 Beta as negative control Role of CRP? IL-6 and frailty relation Inflammation and Aging include in search Association of CMV in inflammation and frailty