clulab / reach

Reach Biomedical Information Extraction
Other
97 stars 39 forks source link

keywords for frailty #715

Closed kwalcock closed 3 years ago

kwalcock commented 3 years ago

On December 26, 2020 at 12:31:17 PM, Enrique Noriega (enoriega@email.arizona.edu) wrote:

Hi,

About the Arizona output: Good question. It looks like it’s been disabled/removed from the current state of the code. The output formats I showed are the only ones that seem to be currently supported:

https://github.com/clulab/reach/blob/5777d66448f1ccb737e69bedae5f8f8073e562b7/src/main/scala/org/clulab/reach/ReachCLI.scala#L158-L189

Additionally, I did a string search on the repo and the only reference to the Arizona output format is in the web interface, last updated more than three years ago:

https://github.com/clulab/reach/blob/5777d66448f1ccb737e69bedae5f8f8073e562b7/export/src/main/resources/org/clulab/reach/export/server/static/fileprocessorwebui.html

I can roll back to an older version of REACH that still has the Arizona output enabled or reach out to Keith to see if he knows what happened to it, depending how worthy this output format is.

Regarding speed, it is indeed slow. It has been running for about one day and has finished processing 1.4k papers, by my counts it would take more than a month to finish if I don’t speed up/scale up. So, I’m disabling context and polarity to see how many more papers are finished by tomorrow. I also noticed there are a lot of error messages, but after glancing at the output log, the stack trace doesn’t point to dynet, it instead seems to be related to new trigger subtypes which aren’t covered on pattern matching statements:

Scala.MatchError: KDtrigger(org.clulab.reach.mentions.BioTextBoundMention@5ebec77a) (of class org.clulab.reach.mentions.KDtrigger)

I suspect those are easy to fix.

I’ll send you an update tomorrow.

On Dec 25, 2020, at 11:03 PM, Mihai Surdeanu surdeanu@gmail.com wrote:

External Email

Thanks Enrique!

Also, Felix Navidad!

On December 25, 2020 at 2:22:32 PM, Enrique Noriega (enoriega@email.arizona.edu) wrote:

Hi Mihai & Clay,

Merry Christmas!

Just wanted to give you a brief update in this:

I located 101,289 documents on 2019’s index and started running reach on them.

I enabled all the output formats, which are: Fries, text, serial-json and indexcard

Indexcard seems to be outdated according to the code, I can disable it if necessary. I also left polarity and context enabled. Idk how long it will take because I think that polarity might be a bit slow when the deep learning engine kicks in, so if polarity may not be relevant in the future, then I can disable it.

I am using Amy and enabled all cores (20) and 150 gigs of ram. If somebody might need it, let me know to scale down the amount of resources.

On Dec 22, 2020, at 3:15 PM, Mihai Surdeanu msurdeanu@email.arizona.edu wrote:

Great, thanks!

On Tue, Dec 22, 2020 at 11:55 Enrique Noriega enoriega@email.arizona.edu wrote: Thanks. I'll get back to you with the results soon

On Tue, Dec 22, 2020 at 12:34 PM Mihai Surdeanu msurdeanu@email.arizona.edu wrote: Hi Enrique,

I propose the following process for a first run:

  1. Add a new query in NxmlSearcher here: https://github.com/clulab/reach/blob/master/main/src/main/scala/org/clulab/reach/indexer/NxmlSearcher.scala#L707 The new query should search for "Interleukin 6" OR "IL-6" (not sure if there are any other variants for this protein)

  2. Run NxmlSearcher on the Dec 2019 index here: /data/nlp/corpora/pmc_openaccess/pmc_dec2019_index Please let us know how many docs this query finds.

  3. Run Reach on the docs from step 2, and save all outputs supported.

  4. We'll then share this output with our collaborators.

Thank you! Mihai

On Tue, Dec 15, 2020 at 12:27 PM Mihai Surdeanu msurdeanu@email.arizona.edu wrote: Hi Enrique and Clayton, Can we meet soon(ish) to talk about this frailty use case?

==================================== positives: Interleukin 6, IL-6 cytomegalovirus HCMV CMV

processes: inflammation

negative ones: IL-1beta CRP

NOTES: Key points: Interleukin1 Beta as negative control Role of CRP? IL-6 and frailty relation Inflammation and Aging include in search Association of CMV in inflammation and frailty

kwalcock commented 3 years ago

Find out when/why Arizona output was removed.

kwalcock commented 3 years ago

Find out what causes this error and whether it is responsible for any slowdown:

Scala.MatchError: KDtrigger(org.clulab.reach.mentions.BioTextBoundMention@5ebec77a) (of class org.clulab.reach.mentions.KDtrigger)

bgyori commented 3 years ago

I believe the Arizona output was removed as part of #667 since it required certain parts of assembly.

MihaiSurdeanu commented 3 years ago

But I believe @myedibleenso added it as part of the ASKE hackathon. I am trying to trace it down.

myedibleenso commented 3 years ago

But I believe @myedibleenso added it as part of the ASKE hackathon

I can't speak to the hackathon you mentioned as I'm not involved in that project. I was asked to add assembly back to support work on causal ordering of events/relations. Perhaps that was around the same time?

There has been a PR open since August 26 that adds back assembly along with compatibility changes related to universal dependencies:

It looks like subsequent merges resulted in some conflicts.

That PR does not include the old ReachCLI capabilities, as at that time no one was interested in having those back in the project.

In order to add them back, I would take the following approach:

  1. Resolve the conflicting files

  2. Reincorporate or revert ReachCLI with assembly

  3. Remove the final sieve (including the preceding andThen):

That classifier has not been retrained and the feature extraction process is what made assembly expensive to run (see https://github.com/clulab/reach/issues/299).

If you're not interested in causal precedence, all but the first sieve in applySieves can be disabled.

MihaiSurdeanu commented 3 years ago

Thanks Gus!

This is my fault. I dropped the ball on this... @enoriega, I will solve this either today or tomorrow, and then hopefully you'll be able to generate the arizona format again.

myedibleenso commented 3 years ago

Not your fault, @MihaiSurdeanu. There was likely a good reason for waiting to merge at the time this was opened, but we've since forgotten what it was. Please let me know if you run into issues.

MihaiSurdeanu commented 3 years ago

@kwalcock: I implemented all the changes recommended by @myedibleenso in his branch: myedibleenso/assembly However, compilation fails with this error:

[error] sbt.librarymanagement.ResolveException: Error downloading org.scalatest:scalatest_2.12:2.2.4
[error]   Not found
[error]   Not found
[error]   not found: /Users/mihais/.ivy2/local/org.scalatest/scalatest_2.12/2.2.4/ivys/ivy.xml
[error]   not found: https://repo1.maven.org/maven2/org/scalatest/scalatest_2.12/2.2.4/scalatest_2.12-2.2.4.pom
[error]   not found: http://artifactory.cs.arizona.edu:8081/artifactory/sbt-release/org/scalatest/scalatest_2.12/2.2.4/scalatest_2.12-2.2.4.pom
[error]     at lmcoursier.CoursierDependencyResolution.unresolvedWarningOrThrow(CoursierDependencyResolution.scala:258)
[error]     at lmcoursier.CoursierDependencyResolution.$anonfun$update$38(CoursierDependencyResolution.scala:227)
[error]     at scala.util.Either$LeftProjection.map(Either.scala:573)
[error]     at lmcoursier.CoursierDependencyResolution.update(CoursierDependencyResolution.scala:227)
[error]     at sbt.librarymanagement.DependencyResolution.update(DependencyResolution.scala:60)
[error]     at sbt.internal.LibraryManagement$.resolve$1(LibraryManagement.scala:53)
[error]     at sbt.internal.LibraryManagement$.$anonfun$cachedUpdate$12(LibraryManagement.scala:103)
[error]     at sbt.util.Tracked$.$anonfun$lastOutput$1(Tracked.scala:73)
[error]     at sbt.internal.LibraryManagement$.$anonfun$cachedUpdate$20(LibraryManagement.scala:116)
[error]     at scala.util.control.Exception$Catch.apply(Exception.scala:228)
[error]     at sbt.internal.LibraryManagement$.$anonfun$cachedUpdate$11(LibraryManagement.scala:116)
[error]     at sbt.internal.LibraryManagement$.$anonfun$cachedUpdate$11$adapted(LibraryManagement.scala:97)
[error]     at sbt.util.Tracked$.$anonfun$inputChangedW$1(Tracked.scala:219)
[error]     at sbt.internal.LibraryManagement$.cachedUpdate(LibraryManagement.scala:130)
[error]     at sbt.Classpaths$.$anonfun$updateTask0$5(Defaults.scala:3440)
[error]     at scala.Function1.$anonfun$compose$1(Function1.scala:49)
[error]     at sbt.internal.util.$tilde$greater.$anonfun$$u2219$1(TypeFunctions.scala:62)
[error]     at sbt.std.Transform$$anon$4.work(Transform.scala:68)
[error]     at sbt.Execute.$anonfun$submit$2(Execute.scala:282)
[error]     at sbt.internal.util.ErrorHandling$.wideConvert(ErrorHandling.scala:23)
[error]     at sbt.Execute.work(Execute.scala:291)
[error]     at sbt.Execute.$anonfun$submit$1(Execute.scala:282)
[error]     at sbt.ConcurrentRestrictions$$anon$4.$anonfun$submitValid$1(ConcurrentRestrictions.scala:265)
[error]     at sbt.CompletionService$$anon$2.call(CompletionService.scala:64)
[error]     at java.util.concurrent.FutureTask.run(FutureTask.java:266)
[error]     at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
[error]     at java.util.concurrent.FutureTask.run(FutureTask.java:266)
[error]     at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
[error]     at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
[error]     at java.lang.Thread.run(Thread.java:748)
[error] (causalAssembly / update) sbt.librarymanagement.ResolveException: Error downloading org.scalatest:scalatest_2.12:2.2.4
[error]   Not found
[error]   Not found
[error]   not found: /Users/mihais/.ivy2/local/org.scalatest/scalatest_2.12/2.2.4/ivys/ivy.xml
[error]   not found: https://repo1.maven.org/maven2/org/scalatest/scalatest_2.12/2.2.4/scalatest_2.12-2.2.4.pom
[error]   not found: http://artifactory.cs.arizona.edu:8081/artifactory/sbt-release/org/scalatest/scalatest_2.12/2.2.4/scalatest_2.12-2.2.4.pom

Can you please look into it?

kwalcock commented 3 years ago

Acknowledged

kwalcock commented 3 years ago

@MihaiSurdeanu, all the other mentions of scalatest were for version 3.0.1. After changing assembly/build.sbt to that version of scalatest, the library problem went away. Some other problems did appear, though. Perhaps they are related to what you were working on.

MihaiSurdeanu commented 3 years ago

@myedibleenso: one of the compilation errors comes from the fact that the old writeJSON had an Assembler as parameter:

https://github.com/clulab/reach/blob/13c06ef793f9c2346a799d299e02d7b63c6fd9c7/export/src/main/scala/org/clulab/reach/export/JsonOutputter.scala#L61-L69

But (see L69) this method calls another writeJSON without the Assembler. Am I missing something, or is that Assembler not used in the writeJSON?

myedibleenso commented 3 years ago

Very strange. According to git I am to blame for that, but I can see no reason behind the change. I think it's safe to remove, as it is unused. It must've been in anticipation of some output format. Sorry about that!

MihaiSurdeanu commented 3 years ago

Thanks @myedibleenso ! Fixed that and added the missing Arizona and CMU outputters. I think we're almost there. But I ran into one more error:

https://github.com/clulab/reach/blob/myedibleenso/assembly/export/src/main/scala/org/clulab/reach/export/cmu/CMUExporter.scala#L104

equivalenceHash takes a parameter, but here it is called without one. Do you know what value should be the default? Thanks!

kwalcock commented 3 years ago

@enoriega, do you have the format you need now?

enoriega commented 3 years ago

@kwalcock Not yet, but almost there!

enoriega commented 3 years ago

This issue has been addressed successfully. Thank you everybody.