clulab / processors

Natural Language Processors
https://clulab.github.io/processors/
Apache License 2.0
418 stars 101 forks source link

[numeric mentions] cannot get date for Thanksgiving Day #677

Closed myedibleenso closed 10 months ago

myedibleenso commented 1 year ago

Somewhat timely, but I'm running into an issue with the NER for Thanksgiving Day:

WARNING: toDateMentionHoliday conversion failed! Recovering and continuing...
java.lang.RuntimeException: ERROR: cannot get date for Thanksgiving Day!
    at org.clulab.numeric.mentions.package$.getHoliday(package.scala:897)
    at org.clulab.numeric.mentions.package$.toDateMentionHoliday(package.scala:672)
    at org.clulab.numeric.actions.NumericActions.$anonfun$mkDateMentionHoliday$1(NumericActions.scala:187)
    at org.clulab.numeric.actions.NumericActions.$anonfun$convert$1(NumericActions.scala:19)
    at scala.collection.Iterator.foreach(Iterator.scala:941)
    at scala.collection.Iterator.foreach$(Iterator.scala:941)
    at scala.collection.AbstractIterator.foreach(Iterator.scala:1429)
    at scala.collection.IterableLike.foreach(IterableLike.scala:74)
    at scala.collection.IterableLike.foreach$(IterableLike.scala:73)
    at scala.collection.AbstractIterable.foreach(Iterable.scala:56)
    at org.clulab.numeric.actions.NumericActions.convert(NumericActions.scala:17)
    at org.clulab.numeric.actions.NumericActions.mkDateMentionHoliday(NumericActions.scala:187)
    at sun.reflect.GeneratedMethodAccessor103.invoke(Unknown Source)
    at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
    at java.lang.reflect.Method.invoke(Method.java:498)
    at scala.reflect.runtime.JavaMirrors$JavaMirror$JavaVanillaMethodMirror2.jinvokeraw(JavaMirrors.scala:417)
    at scala.reflect.runtime.JavaMirrors$JavaMirror$JavaMethodMirror.jinvoke(JavaMirrors.scala:373)
    at scala.reflect.runtime.JavaMirrors$JavaMirror$JavaVanillaMethodMirror.apply(JavaMirrors.scala:389)
    at org.clulab.odin.impl.ActionMirror.$anonfun$reflect$1(ActionMirror.scala:23)
    at org.clulab.odin.impl.TokenExtractor.findAllIn(Extractor.scala:45)
    at org.clulab.odin.impl.Extractor.$anonfun$findAllIn$1(Extractor.scala:20)
    at org.clulab.odin.impl.Extractor.$anonfun$findAllIn$1$adapted(Extractor.scala:19)
    at scala.collection.TraversableLike.$anonfun$flatMap$1(TraversableLike.scala:245)
    at scala.collection.immutable.Range.foreach(Range.scala:158)
    at scala.collection.TraversableLike.flatMap(TraversableLike.scala:245)
    at scala.collection.TraversableLike.flatMap$(TraversableLike.scala:242)
    at scala.collection.AbstractTraversable.flatMap(Traversable.scala:108)
    at org.clulab.odin.impl.Extractor.findAllIn(Extractor.scala:19)
    at org.clulab.odin.impl.Extractor.findAllIn$(Extractor.scala:18)
    at org.clulab.odin.impl.TokenExtractor.findAllIn(Extractor.scala:33)
    at org.clulab.odin.ExtractorEngine.$anonfun$extractFrom$2(ExtractorEngine.scala:45)
    at scala.collection.TraversableLike$WithFilter.$anonfun$flatMap$2(TraversableLike.scala:858)
    at scala.collection.Iterator.foreach(Iterator.scala:941)
    at scala.collection.Iterator.foreach$(Iterator.scala:941)
    at scala.collection.AbstractIterator.foreach(Iterator.scala:1429)
    at scala.collection.IterableLike.foreach(IterableLike.scala:74)
    at scala.collection.IterableLike.foreach$(IterableLike.scala:73)
    at scala.collection.AbstractIterable.foreach(Iterable.scala:56)
    at scala.collection.TraversableLike$WithFilter.flatMap(TraversableLike.scala:857)
    at org.clulab.odin.ExtractorEngine.extract$1(ExtractorEngine.scala:43)
    at org.clulab.odin.ExtractorEngine.loop$1(ExtractorEngine.scala:34)
    at org.clulab.odin.ExtractorEngine.extractFrom(ExtractorEngine.scala:56)
    at org.clulab.odin.ExtractorEngine.extractFrom(ExtractorEngine.scala:24)
    at org.clulab.numeric.NumericEntityRecognizer.extractFrom(NumericEntityRecognizer.scala:46)
    at org.clulab.processors.clu.CluProcessor.recognizeNamedEntities(CluProcessor.scala:672)
    at org.clulab.processors.clu.CluProcessor.$anonfun$annotate$1(CluProcessor.scala:230)
    at org.clulab.utils.BeforeAndAfter.perform(BeforeAndAfter.scala:10)
    at org.clulab.utils.BeforeAndAfter.perform$(BeforeAndAfter.scala:7)
    at org.clulab.processors.clu.GivenConstEmbeddingsAttachment.perform(CluProcessor.scala:1012)
    at org.clulab.processors.clu.CluProcessor.annotate(CluProcessor.scala:226)
    at org.clulab.processors.Processor.annotate(Processor.scala:65)
    at org.clulab.processors.Processor.annotate$(Processor.scala:62)
    at org.clulab.processors.clu.CluProcessor.annotate(CluProcessor.scala:222)
    at scala.concurrent.Future$.$anonfun$apply$1(Future.scala:659)
    at scala.util.Success.$anonfun$map$1(Try.scala:255)
    at scala.util.Success.map(Try.scala:213)
    at scala.concurrent.Future.$anonfun$map$1(Future.scala:292)
    at scala.concurrent.impl.Promise.liftedTree1$1(Promise.scala:33)
    at scala.concurrent.impl.Promise.$anonfun$transform$1(Promise.scala:33)
    at scala.concurrent.impl.CallbackRunnable.run(Promise.scala:64)
    at java.util.concurrent.ForkJoinTask$RunnableExecuteAction.exec(ForkJoinTask.java:1402)
    at java.util.concurrent.ForkJoinTask.doExec(ForkJoinTask.java:289)
    at java.util.concurrent.ForkJoinPool$WorkQueue.runTask(ForkJoinPool.java:1056)
    at java.util.concurrent.ForkJoinPool.runWorker(ForkJoinPool.java:1692)
    at java.util.concurrent.ForkJoinWorkerThread.run(ForkJoinWorkerThread.java:175)
MihaiSurdeanu commented 1 year ago

We use jollyday, which should normalize Thanksgiving Day... Their lexicons indicate that this is covered: https://jollyday.sourceforge.net/names_holiday_default.html

Something is happening as a result of this line: https://github.com/clulab/processors/blob/master/main/src/main/scala/org/clulab/numeric/HolidayNormalizer.scala#L25

@kwalcock : can you please take a look when you can? Maybe we are behind in version numbers?

kwalcock commented 1 year ago

It looks like that was incorporated as far back as processors 8.4.8. I'll check further.

kwalcock commented 1 year ago

The problem seems to be that there are two Thanksgiving Days. The second is presumably for Canada. The code only gives a norm if there is a definitive value.

2017-11-23 (Thanksgiving Day)
2017-11-07 (Thanksgiving Day)
MihaiSurdeanu commented 1 year ago

I wonder if we can default to the US in case of ambiguities.

On Fri, Nov 18, 2022 at 16:51 Keith Alcock @.***> wrote:

The problem seems to be that there are two Thanksgiving Days. The second is presumably for Canada. The code only gives a norm if there is a definitive value.

2017-11-23 (Thanksgiving Day) 2017-11-07 (Thanksgiving Day)

— Reply to this email directly, view it on GitHub https://github.com/clulab/processors/issues/677#issuecomment-1320667749, or unsubscribe https://github.com/notifications/unsubscribe-auth/AAI75TQSBBH36PHZP4OSX6TWJAJBDANCNFSM6AAAAAASEYO4RY . You are receiving this because you commented.Message ID: @.***>

myedibleenso commented 1 year ago

Are there other ambiguous cases? I wouldn't be surprised (ex Independence Day). Maybe just avoid normalizing if ambiguous or return all assignments?

myedibleenso commented 1 year ago

Was this addressed in https://github.com/clulab/processors/pull/689?

myedibleenso commented 1 year ago

Looks like @kwalcock also updated the artifact registry to use SSL 🎊 ! This will make life easier for those us wrestling with firewalls.

When do you all plan to release 8.5.3?

kwalcock commented 1 year ago

@myedibleenso, I do not believe #689 was related and we never decided what should be done in ambiguous cases. I would guess ~1 week for new release, but https://artifactory.clulab.org can already be used if you edit the resolver in sbt. Maven might not yet work since it can follow old transitive dependencies.

kwalcock commented 10 months ago

It turns out not to have been Canada, but Montana instead. At some point in time Thanksgiving was celebrated on a different day there. Canada doesn't come into play because of ManagerParameters.create(HolidayCalendar.UNITED_STATES). We don't know anything about the state, so I'm going to try to change the logic so that any state-specific values are not used unless necessary.

kwalcock commented 10 months ago

This should be taken care of with #771.