clulab / reach

Reach Biomedical Information Extraction
Other
97 stars 39 forks source link

AzFailsafeKBML does not seem to be threadsafe #732

Closed kwalcock closed 3 years ago

kwalcock commented 3 years ago

It may be that I misunderstand, but it appears that ReachCLI processes papers in parallel. It seems that a single instance of AzFailsafeKBML with its memoryKB is used to generate names for all unknown texts. The code for resolve() does not seem to be threadsafe.

  override def resolve (text:String): Resolutions = {
    val resolutions = memoryKB.lookupNoSpecies(text)
    if (resolutions.isDefined)                           // text has been resolved
      resolutions
    else {                                               // else no existing entries for this text
      val refId = mkRefId(text)                          // so create a new reference ID
      memoryKB.addEntries(text, DefaultNamespace, refId) // create new KB entries for this text
      memoryKB.lookupNoSpecies(text)                     // and return results from repeating lookup
    }
  }

If two copies of the same text arrive at nearly the same time, they can be given different refIds. Only one of these will be stored in the memoryKB and two different values will be output. If we're unlucky, the data structure used might get corrupted. I don't see that anything is done in InMemoryKB related to synchronization. It looks to me like resolve needs to be synchronized.

MihaiSurdeanu commented 3 years ago

Nice catch! That block needs to be synchronized.

kwalcock commented 3 years ago

Fixed with #733.