facebook / duckling

Language, engine, and tooling for expressing, testing, and evaluating composable language rules on input strings.
Other
4.05k stars 723 forks source link

Time/EN ruleIntersect is probably too aggressive with years #603

Open stroxler opened 3 years ago

stroxler commented 3 years ago

When debugging a separate issue, I discovered that Time/EN happily resolves "9:15 815" to 9:15 on January 1, 815 (i.e. a very specific time of day during the Byzantine Empire), marked as non-latent. Similarly, "9:15 35" resolves to 9:15 on January 1 2035.

It seems unlikely to me that anyone inputting this construct really wants a year after a bare time of day, it's more likely that the number is unrelated, e.g. "9:15 35 cars drove by".

I think the most obvious option is to break ruleIntersect into two rules, one that accepts only non-latent second tokens and the other which accepts latent years but sanity checks that the grain of the first token isn't overly specific to be referring to a time in a year.

ruleIntersect

ruleIntersect :: Rule
ruleIntersect = Rule
  { name = "intersect"
  , pattern =
    [ Predicate $ isGrainFinerThan TG.Year
    , Predicate $ or . sequence [isNotLatent, isGrainOfTime TG.Year]
    ]
  , prod = \tokens -> case tokens of
      (Token Time td1:Token Time td2:_)
        | not (TTime.latent td1) || not (TTime.latent td2) ->
        Token Time . notLatent <$> intersect td1 td2
      _ -> Nothing
  }
prashantskit commented 1 year ago

Is there any resolution to this issue?

stroxler commented 1 year ago

I don't believe so. Have you hit this problem?

I might be able to take a stab at it, although it's been a while since I touched Duckling