clulab / reach

Reach Biomedical Information Extraction
Other
96 stars 39 forks source link

BioNLPProcessor.annotate fails when withRuleNER is set to false #781

Open kwalcock opened 1 year ago

kwalcock commented 1 year ago

This was an issue filed under processors, but I believe that BioNLPProcessor is part of reach in the meantime. The original issue is at https://github.com/clulab/processors/issues/255. Part of the discussion is pasted below:

@MihaiSurdeanu, it seems that BioNLPProcessor.annotate now fails when withRuleNER is set to false, even if withCRFNER is set to true:

java.util.NoSuchElementException: None.get
  at scala.None$.get(Option.scala:347)
  at scala.None$.get(Option.scala:345)
  at org.clulab.processors.clu.bio.BioNERPostProcessor.process(BioNERPostProcessor.scala:23)
  at org.clulab.processors.bionlp.BioNLPProcessor$$anonfun$recognizeNamedEntities$1.apply(BioNLPProcessor.scala:59)
  at org.clulab.processors.bionlp.BioNLPProcessor$$anonfun$recognizeNamedEntities$1.apply(BioNLPProcessor.scala:58)
  at scala.collection.IndexedSeqOptimized$class.foreach(IndexedSeqOptimized.scala:33)
  at scala.collection.mutable.ArrayOps$ofRef.foreach(ArrayOps.scala:186)
  at org.clulab.processors.bionlp.BioNLPProcessor.recognizeNamedEntities(BioNLPProcessor.scala:58)
  at org.clulab.processors.Processor$class.annotate(Processor.scala:89)
  at org.clulab.processors.shallownlp.ShallowNLPProcessor.annotate(ShallowNLPProcessor.scala:29)
  at org.clulab.processors.Processor$class.annotate(Processor.scala:59)
  at org.clulab.processors.shallownlp.ShallowNLPProcessor.annotate(ShallowNLPProcessor.scala:29)

This is due to BioNERPostProcessor attempting to access the .entities attribute of each sentence it processes.

One option is to change recognizeNamedEntities:

  override def recognizeNamedEntities(doc:Document) {
    hybridNER.recognizeNamedEntities(doc, namedEntitySanityCheck(doc))

    for {
      sentence <- doc.sentences
      // only post-process if we have entities
      if sentence.entities.nonEmpty
    } { nerPostProcessor.process(sentence) }
  }

I think it would be better to handle it in BioNERPostProcessor.process instead, though.

MihaiSurdeanu commented 1 year ago

I agree with the proposed solution. And also with the fact that it should be addressed in BioNLPProcessor. Thanks!