clulab / processors

Natural Language Processors
https://clulab.github.io/processors/
Apache License 2.0
418 stars 101 forks source link

These sentences cause processors to crash #812

Open kwalcock opened 1 day ago

kwalcock commented 1 day ago

The problem was noted with version 8.5.4 and they crash the currently deployed webapp. These have to do with numbers. A rule finds a mention, but the processing of the mention doesn't work. There may be exception handling in place that should result in the mention being thrown away, but it doesn't seem to be working.

pdf 75 Some of the major sectoral plans include the national energy policy, transport policy, forest and wildlife policy, national environmental sanitation strategy etc. 76 https://www.thegasconsortium.com/documents/GMP-Final-Jun16.pdf 77 http://www.energycom.gov.gh/files/Renewable-Energy-Masterplan-February-2019.pdf 78 http://energycom.gov.gh/files/SE4ALL-GHANA%20ACTION%20PLAN.pdf 79 http://www.energycom.gov.gh/files/Ghana%20Integrated%20Power%20System%20Master%20Plan%20_Volume%202.pdf Page | 80 In 2017, the Ministry of Energy started a comprehensive review of the National Energy Policy, which is still ongoing.
16 http://www.energycom.gov.gh/files/Renewable-Energy-Masterplan-February-2019.pdf 17 http://energycom.gov.gh/files/SE4ALL-GHANA%20ACTION%20PLAN.pdf Page | 14 Forestry National Forest and Wildlife Policy Stumpage Fees Surcharge on timber as part of the timber harvesting regulation regime.
They are: * One-district one factory, * Integrated Aluminium Industry * Planting for food and jobs, * One village one dam * Aquaculture for food and jobs * One district one warehouse Table 3 shows the linkages between nationally determined contributions, national climate change policy, medium-term development frameworks, government flagships projects and SDGs 18 https://www.mwh.gov.gh/wp-content/uploads/2018/05/SECTOR-MEDIUM-TERM-DEVELOPMENT-PLAN-2018-2021.pdf 19 http://nadmo.gov.gh/images/NADMO_documents/2015_documents/GHANA%20PLAN%20OF%20ACTION%20ON%20DRRCCA%202011-2015.pdf 20 Essebey G.O, Nutsukpo D, Karbo N, and Zougmore R. 2015.
Over-aged vehicle tax Tax incentive based on polluter pays principle Vehicles of all technology categories Fiscal Reduce import of poorly performing engine 252 https://www.sdfghana.org/ 253 http://www.odekro.org/Images/Uploads/Ghana%20Infrastructure%20Investment%20Fund%20Act,%202014.pdf Page | 252 6.3.2 Skills Development for Technology Development, Transfer and Adoption Skills development is a crucial element for speedy technology adoption at all levels.
35 http://mofa.gov.gh/site/publications/policies-plans/316-national-agriculture-investment-plan-ifj 36 https://www.gipcghana.com/invest-in-ghana/sectors/75-forestry/313-investing-in-ghana-s-forestry-sector.html 37 https://www.gipcghana.com/invest-in-ghana/sectors/75-forestry/313-investing-in-ghana-s-forestry-sector.html 38 https://www.fcghana.org/userfiles/files/MLNR/FDMP%20Final%20(2).
42 https://eiti.org/es/implementing_country/4 43 https://resourcegovernance.org/sites/default/files/Minerals%20and%20Mining%20Act%20703%20Ghana.pdf 44 https://www.lexadin.nl/wlg/legis/nofr/oeur/arch/gha/490.pdf 45http://www.petrocom.gov.gh/L&C_folder/Pet_register/laws/PETROLEUM%20(EXPLORATION%20AND%20PRODUCTION)%20ACT,%202016%20 (ACT%20919).

There's an example stack trace:

WARNING: toNumberRangeMention conversion failed! Recovering and continuing...
java.lang.RuntimeException: ERROR: could not find argument number1 in mention [76 https://www.thegasconsortium.com/documents/GMP-Final-Jun16.pdf 77 http://www.energycom.gov.gh/files/Renewable-Energy-Masterplan-February-2019.pdf 78 http://energycom.gov.gh/files/SE4ALL-GHANA%20ACTION%20PLAN.pdf 79]!
    at org.clulab.numeric.mentions.package$.toNumberRangeMention(package.scala:33)
    at org.clulab.numeric.actions.NumericActions.$anonfun$mkNumberRangeMention$1(NumericActions.scala:52)
    at org.clulab.numeric.actions.NumericActions.$anonfun$convert$1(NumericActions.scala:21)
    at scala.collection.Iterator.foreach(Iterator.scala:943)
    at scala.collection.Iterator.foreach$(Iterator.scala:943)
    at scala.collection.AbstractIterator.foreach(Iterator.scala:1431)
    at scala.collection.IterableLike.foreach(IterableLike.scala:74)
    at scala.collection.IterableLike.foreach$(IterableLike.scala:73)
    at scala.collection.AbstractIterable.foreach(Iterable.scala:56)
    at org.clulab.numeric.actions.NumericActions.convert(NumericActions.scala:19)
    at org.clulab.numeric.actions.NumericActions.mkNumberRangeMention(NumericActions.scala:52)
    at java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
    at java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
    at java.base/jdk.internal.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
    at java.base/java.lang.reflect.Method.invoke(Method.java:566)
    at scala.reflect.runtime.JavaMirrors$JavaMirror$JavaVanillaMethodMirror2.jinvokeraw(JavaMirrors.scala:424)
    at scala.reflect.runtime.JavaMirrors$JavaMirror$JavaMethodMirror.jinvoke(JavaMirrors.scala:380)
    at scala.reflect.runtime.JavaMirrors$JavaMirror$JavaVanillaMethodMirror.apply(JavaMirrors.scala:396)
    at org.clulab.odin.impl.ActionMirror.$anonfun$reflect$1(ActionMirror.scala:24)
    at org.clulab.odin.impl.TokenExtractor.findAllIn(Extractor.scala:45)
    at org.clulab.odin.impl.Extractor.$anonfun$findAllIn$1(Extractor.scala:20)
    at org.clulab.odin.impl.Extractor.$anonfun$findAllIn$1$adapted(Extractor.scala:19)
    at scala.collection.TraversableLike.$anonfun$flatMap$1(TraversableLike.scala:293)
    at scala.collection.immutable.Range.foreach(Range.scala:158)
    at scala.collection.TraversableLike.flatMap(TraversableLike.scala:293)
    at scala.collection.TraversableLike.flatMap$(TraversableLike.scala:290)
    at scala.collection.AbstractTraversable.flatMap(Traversable.scala:108)
    at org.clulab.odin.impl.Extractor.findAllIn(Extractor.scala:19)
    at org.clulab.odin.impl.Extractor.findAllIn$(Extractor.scala:18)
    at org.clulab.odin.impl.TokenExtractor.findAllIn(Extractor.scala:33)
    at org.clulab.odin.ExtractorEngine.$anonfun$extractFrom$2(ExtractorEngine.scala:46)
    at scala.collection.TraversableLike$WithFilter.$anonfun$flatMap$2(TraversableLike.scala:966)
    at scala.collection.Iterator.foreach(Iterator.scala:943)
    at scala.collection.Iterator.foreach$(Iterator.scala:943)
    at scala.collection.AbstractIterator.foreach(Iterator.scala:1431)
    at scala.collection.IterableLike.foreach(IterableLike.scala:74)
    at scala.collection.IterableLike.foreach$(IterableLike.scala:73)
    at scala.collection.AbstractIterable.foreach(Iterable.scala:56)
    at scala.collection.TraversableLike$WithFilter.flatMap(TraversableLike.scala:965)
    at org.clulab.odin.ExtractorEngine.extract$1(ExtractorEngine.scala:44)
    at org.clulab.odin.ExtractorEngine.loop$1(ExtractorEngine.scala:35)
    at org.clulab.odin.ExtractorEngine.extractFrom(ExtractorEngine.scala:57)
    at org.clulab.odin.ExtractorEngine.extractFrom(ExtractorEngine.scala:25)
    at org.clulab.numeric.NumericEntityRecognizer.extractFrom(NumericEntityRecognizer.scala:48)
    at org.clulab.processors.clu.CluProcessor.recognizeNamedEntities(CluProcessor.scala:667)
    at org.clulab.processors.clu.CluProcessor.$anonfun$annotate$1(CluProcessor.scala:240)
    at org.clulab.utils.BeforeAndAfter.perform(BeforeAndAfter.scala:10)
    at org.clulab.utils.BeforeAndAfter.perform$(BeforeAndAfter.scala:7)
    at org.clulab.processors.clu.GivenConstEmbeddingsAttachment.perform(CluProcessor.scala:1005)
    at org.clulab.processors.clu.CluProcessor.annotate(CluProcessor.scala:236)
    at org.clulab.processors.Processor.annotate(Processor.scala:128)
    at org.clulab.processors.Processor.annotate$(Processor.scala:125)
    at org.clulab.processors.clu.CluProcessor.annotate(CluProcessor.scala:232)
    at org.clulab.heuristics.backend.nlp.apps.AddSanitizedApp$.$anonfun$new$1(AddSanitizedApp.scala:153)
    at org.clulab.heuristics.backend.nlp.apps.AddSanitizedApp$.$anonfun$new$1$adapted(AddSanitizedApp.scala:148)
    at scala.collection.immutable.List.foreach(List.scala:431)
    at org.clulab.heuristics.backend.nlp.apps.AddSanitizedApp$.delayedEndpoint$org$clulab$heuristics$backend$nlp$apps$AddSanitizedApp$1(AddSanitizedApp.scala:148)
    at org.clulab.heuristics.backend.nlp.apps.AddSanitizedApp$delayedInit$body.apply(AddSanitizedApp.scala:114)
    at scala.Function0.apply$mcV$sp(Function0.scala:39)
    at scala.Function0.apply$mcV$sp$(Function0.scala:39)
    at scala.runtime.AbstractFunction0.apply$mcV$sp(AbstractFunction0.scala:17)
    at scala.App.$anonfun$main$1$adapted(App.scala:80)
    at scala.collection.immutable.List.foreach(List.scala:431)
    at scala.App.main(App.scala:80)
    at scala.App.main$(App.scala:78)
    at org.clulab.heuristics.backend.nlp.apps.AddSanitizedApp$.main(AddSanitizedApp.scala:114)
    at org.clulab.heuristics.backend.nlp.apps.AddSanitizedApp.main(AddSanitizedApp.scala)
java.lang.RuntimeException: ERROR: could not parse the number [WrappedArray(78, http://energycom.gov.gh/files/SE4ALL-GHANA%20ACTION%20PLAN.pdf, 79)] in the percentage 78 http://energycom.gov.gh/files/SE4ALL-GHANA%20ACTION%20PLAN.pdf 79 http://www.energycom.gov.gh/files/Ghana%20Integrated%20Power%20System%20Master%20Plan%20_Volume%202.pdf!
    at org.clulab.numeric.mentions.PercentageMention.neNorm(PercentageMention.scala:23)
    at org.clulab.numeric.package$.addLabelsAndNorms(package.scala:127)
    at org.clulab.numeric.package$.$anonfun$setLabelsAndNorms$4(package.scala:97)
    at org.clulab.numeric.package$.$anonfun$setLabelsAndNorms$4$adapted(package.scala:95)
    at scala.collection.Iterator.foreach(Iterator.scala:943)
    at scala.collection.Iterator.foreach$(Iterator.scala:943)
    at scala.collection.AbstractIterator.foreach(Iterator.scala:1431)
    at scala.collection.IterableLike.foreach(IterableLike.scala:74)
    at scala.collection.IterableLike.foreach$(IterableLike.scala:73)
    at scala.collection.AbstractIterable.foreach(Iterable.scala:56)
    at org.clulab.numeric.package$.setLabelsAndNorms(package.scala:95)
    at org.clulab.processors.clu.CluProcessor.recognizeNamedEntities(CluProcessor.scala:669)
    at org.clulab.processors.clu.CluProcessor.$anonfun$annotate$1(CluProcessor.scala:240)
    at org.clulab.utils.BeforeAndAfter.perform(BeforeAndAfter.scala:10)
    at org.clulab.utils.BeforeAndAfter.perform$(BeforeAndAfter.scala:7)
    at org.clulab.processors.clu.GivenConstEmbeddingsAttachment.perform(CluProcessor.scala:1005)
    at org.clulab.processors.clu.CluProcessor.annotate(CluProcessor.scala:236)
    at org.clulab.processors.Processor.annotate(Processor.scala:128)
    at org.clulab.processors.Processor.annotate$(Processor.scala:125)
    at org.clulab.processors.clu.CluProcessor.annotate(CluProcessor.scala:232)
    at org.clulab.heuristics.backend.nlp.apps.AddSanitizedApp$.$anonfun$new$1(AddSanitizedApp.scala:153)
    at org.clulab.heuristics.backend.nlp.apps.AddSanitizedApp$.$anonfun$new$1$adapted(AddSanitizedApp.scala:148)
    at scala.collection.immutable.List.foreach(List.scala:431)
    at org.clulab.heuristics.backend.nlp.apps.AddSanitizedApp$.delayedEndpoint$org$clulab$heuristics$backend$nlp$apps$AddSanitizedApp$1(AddSanitizedApp.scala:148)
    at org.clulab.heuristics.backend.nlp.apps.AddSanitizedApp$delayedInit$body.apply(AddSanitizedApp.scala:114)
    at scala.Function0.apply$mcV$sp(Function0.scala:39)
    at scala.Function0.apply$mcV$sp$(Function0.scala:39)
    at scala.runtime.AbstractFunction0.apply$mcV$sp(AbstractFunction0.scala:17)
    at scala.App.$anonfun$main$1$adapted(App.scala:80)
    at scala.collection.immutable.List.foreach(List.scala:431)
    at scala.App.main(App.scala:80)
    at scala.App.main$(App.scala:78)
    at org.clulab.heuristics.backend.nlp.apps.AddSanitizedApp$.main(AddSanitizedApp.scala:114)
    at org.clulab.heuristics.backend.nlp.apps.AddSanitizedApp.main(AddSanitizedApp.scala)
MihaiSurdeanu commented 10 hours ago

Thanks @kwalcock ! I'll take a look.