clulab / habitus

2 stars 5 forks source link

context-related issues #142

Open maxaalexeeva opened 2 years ago

maxaalexeeva commented 2 years ago

These are two issues related to each other.

  case class ProcessToLemmas(process: String, lemmas: Set[String]) {
    def this(processAndLemmas: (String, Set[String])) = this(processAndLemmas._1, processAndLemmas._2)
  }

  // These are prioritized highest to lowest because there can be multiple matches.
  val processToLemmasMap = Seq(
    "planting"         -> Set("plant", "sow", "cultivate", "cultivation", "grow"),
    "harvest"          -> Set("harvest", "yield"),
    "credit"           -> Set("credit", "finance", "value"),
    "irrigation"       -> Set("irrigation", "irrigate"),
    "weeds"            -> Set("weed"),
    "natural_disaster" -> Set("flood", "bird", "attack")
  ).map(new ProcessToLemmas(_)) // just for convenience

  def getProcess(mention: Mention): String = {
    val lemmas = mention.sentenceObj.lemmas.get
    val process = processToLemmasMap
        .find { processToLemmas =>
          lemmas.exists(processToLemmas.lemmas)
        }
        .map(_.process)
        .getOrElse("UNK")

    process
  }
maxaalexeeva commented 2 years ago

@IkeKobby fyi

MihaiSurdeanu commented 2 years ago
maxaalexeeva commented 2 years ago

@MihaiSurdeanu Allegra needed it at some point to distinguish between events/sentences related to harvesting, planting, etc. Not sure how crucial it is now.

On planting areas: to handle the ambiguous phrases, we could extract generic "area" in the grammar, and then have an action that checks if some meaningful keywords appear anywhere in the sentence, e.g., "sow", "cultivate", etc.'

ok

kwalcock commented 2 years ago

I can imagine wanting some other algorithm entirely, but for getProcess() it should be easy enough to count how many matches there are for each key and use the key of the maximum value. The result would be dependent on the ordering only in the case of ties.

import scala.collection.immutable.ListMap

  val processToLemmas = ListMap(
    "planting" -> Seq("plant", "sow", "cover", "cultivate", "grow"),
    "harvesting" -> Seq("harvest", "yield"),
    "credit" -> Seq("credit", "finance", "value", "correspond"),
    "natural_disaster" -> Seq("flood", "bird", "attack")
  )

  def getProcess(mention: Mention): String = {
    val lemmas = mention.sentenceObj.lemmas.get
    val process = processToLemmas
        .mapValues(_.count(lemmas.contains))
        .maxBy(_._2)
        ._1

    process
  }
maxaalexeeva commented 2 years ago

@kwalcock I like that! Thank you!

kwalcock commented 2 years ago

There are two versions of getProcess above. I guess the algorithm is in flux. In the top one there is a helper class that I decided was not necessary and I also forgot about ListMap. The code could look more like the bottom one.

import scala.collection.immutable.ListMap

  // These are prioritized highest to lowest.
  val processToLemmas = ListMap(
    "planting"         -> Set("plant", "sow", "cultivate", "cultivation", "grow"),
    "harvesting"       -> Set("harvest", "yield"),
    "credit"           -> Set("credit", "finance", "value"),
    "irrigation"       -> Set("irrigation", "irrigate"),
    "weeds"            -> Set("weed"),
    "natural_disaster" -> Set("flood", "bird", "attack")
  )

  def getProcess(mention: Mention): String = {
    val sentenceLemmas = mention.sentenceObj.lemmas.get
    val process = processToLemmas
        .find { case (_, processLemmas) =>
          sentenceLemmas.exists(processLemmas)
        }
        .map { case (process, _) => process }
        .getOrElse("UNK")

    process
  }
maxaalexeeva commented 2 years ago

@kwalcock thanks!