clulab / processors

Natural Language Processors
https://clulab.github.io/processors/
Apache License 2.0
418 stars 101 forks source link

Example Scala code for v5.8.1 keeps throwing OutOfMemoryError #48

Closed ChetanBhasin closed 8 years ago

ChetanBhasin commented 8 years ago

The example code from the README file for Scala keeps throwing OutOfMemoryError with default SBT parameters.

I remember that I have previously used this library without making any changes to JVM parameters and it works just fine. This is the first time, and with the current version, that I have noticed something like this.

Any idea why this might be happening?

ChetanBhasin commented 8 years ago

More information on above: the error happens with the provided model for both CoreNLPParser and FastNLPParser.

MihaiSurdeanu commented 8 years ago

@marcovzla Please remind me why we commented out the javaOptions in build.sbt? I know we had a good reason, but I don't recall what it was.

marcovzla commented 8 years ago

We had issues with travis, so we stopped forking the jvm. See https://github.com/travis-ci/travis-ci/issues/3775

We are now using .sbtopts to set the memory, but that file is ignored in windows. Could that be the issue?

ChetanBhasin commented 8 years ago

@MihaiSurdeanu Does this mean that such behavior is expected under default parameters and that one should provide more memory?

This is strange because the examples odin-examples repository work just fine under default parameters even while using the same version of the library.

myedibleenso commented 8 years ago

@ChetanBhasin, odin-examples is actually using an outdated version of processors (5.7.0):

https://github.com/clulab/odin-examples/blob/d9f39a1501bfa27d63675282dfd38f7ad60501c6/build.sbt#L8-L9

That was before we started using Travis...

ChetanBhasin commented 8 years ago

@myedibleenso I just tried updating the version in odin-examples to 5.8.1 without making any change to the code it still looks fine.

Jumping inside REPL and trying out the example code, however, fails again.

Here, I'd paste the original input and output from the console launched using sbt console:

scala> import edu.arizona.sista.processors.corenlp._
import edu.arizona.sista.processors.corenlp._

scala> val proc = new CoreNLPProcessor(withDiscourse = true)
proc: edu.arizona.sista.processors.corenlp.CoreNLPProcessor = edu.arizona.sista.processors.corenlp.CoreNLPProcessor@53814411

scala> val doc = proc.annotate("How many times have you been to China before?")
Adding annotator tokenize
Adding annotator ssplit
Adding annotator pos
Reading POS tagger model from edu/stanford/nlp/models/pos-tagger/english-left3words/english-left3words-distsim.tagger ... done [1.1 sec].
Adding annotator lemma
Adding annotator ner
Loading classifier from edu/stanford/nlp/models/ner/english.all.3class.distsim.crf.ser.gz ... done [5.1 sec].
Loading classifier from edu/stanford/nlp/models/ner/english.muc.7class.distsim.crf.ser.gz ... done [2.6 sec].
Loading classifier from edu/stanford/nlp/models/ner/english.conll.4class.distsim.crf.ser.gz ... done [3.6 sec].
sutime.binder.1.
Initializing JollyDayHoliday for sutime with classpath:edu/stanford/nlp/models/sutime/jollyday/Holidays_sutime.xml
Reading TokensRegex rules from edu/stanford/nlp/models/sutime/defs.sutime.txt
Reading TokensRegex rules from edu/stanford/nlp/models/sutime/english.sutime.txt
Mar 22, 2016 2:06:51 PM edu.stanford.nlp.ling.tokensregex.CoreMapExpressionExtractor appendRules
INFO: Ignoring inactive rule: null
Mar 22, 2016 2:06:51 PM edu.stanford.nlp.ling.tokensregex.CoreMapExpressionExtractor appendRules
INFO: Ignoring inactive rule: temporal-composite-8:ranges
Reading TokensRegex rules from edu/stanford/nlp/models/sutime/english.holidays.sutime.txt
Loading parser from serialized file edu/stanford/nlp/models/lexparser/englishPCFG.ser.gz ... done [0.8 sec].
macro=true
featureCountThresh=10
featureFactory=edu.arizona.sista.processors.corenlp.chunker.ChunkingFeatureFactory
Adding annotator dcoref
java.lang.OutOfMemoryError: GC overhead limit exceeded
  at java.util.AbstractList.iterator(AbstractList.java:288)
  at java.util.AbstractList.hashCode(AbstractList.java:540)
  at java.util.HashMap.hash(HashMap.java:338)
  at java.util.HashMap.readObject(HashMap.java:1397)
  at sun.reflect.GeneratedMethodAccessor5.invoke(Unknown Source)
  at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
  at java.lang.reflect.Method.invoke(Method.java:497)
  at java.io.ObjectStreamClass.invokeReadObject(ObjectStreamClass.java:1017)
  at java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:1896)
  at java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:1801)
  at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1351)
  at java.io.ObjectInputStream.readObject(ObjectInputStream.java:371)
  at edu.stanford.nlp.io.IOUtils.readObjectFromURLOrClasspathOrFileSystem(IOUtils.java:313)
  at edu.stanford.nlp.dcoref.Dictionaries.loadGenderNumber(Dictionaries.java:389)
  at edu.stanford.nlp.dcoref.Dictionaries.<init>(Dictionaries.java:553)
  at edu.stanford.nlp.dcoref.Dictionaries.<init>(Dictionaries.java:462)
  at edu.stanford.nlp.dcoref.SieveCoreferenceSystem.<init>(SieveCoreferenceSystem.java:283)
  at edu.stanford.nlp.pipeline.DeterministicCorefAnnotator.<init>(DeterministicCorefAnnotator.java:52)
  at edu.stanford.nlp.pipeline.AnnotatorImplementations.coref(AnnotatorImplementations.java:181)
  at edu.stanford.nlp.pipeline.AnnotatorFactories$12.create(AnnotatorFactories.java:485)
  at edu.stanford.nlp.pipeline.AnnotatorPool.get(AnnotatorPool.java:85)
  at edu.stanford.nlp.pipeline.StanfordCoreNLP.construct(StanfordCoreNLP.java:289)
  at edu.stanford.nlp.pipeline.StanfordCoreNLP.<init>(StanfordCoreNLP.java:126)
  at edu.arizona.sista.processors.corenlp.CoreNLPProcessor.mkCoref(CoreNLPProcessor.scala:56)
  at edu.arizona.sista.processors.corenlp.CoreNLPProcessor.coref$lzycompute(CoreNLPProcessor.scala:31)
  at edu.arizona.sista.processors.corenlp.CoreNLPProcessor.coref(CoreNLPProcessor.scala:31)
  at edu.arizona.sista.processors.corenlp.CoreNLPProcessor.resolveCoreference(CoreNLPProcessor.scala:148)
  at edu.arizona.sista.processors.Processor$class.annotate(Processor.scala:108)
  at edu.arizona.sista.processors.shallownlp.ShallowNLPProcessor.annotate(ShallowNLPProcessor.scala:25)
  at edu.arizona.sista.processors.Processor$class.annotate(Processor.scala:88)
  at edu.arizona.sista.processors.shallownlp.ShallowNLPProcessor.annotate(ShallowNLPProcessor.scala:25)
  ... 1 elided
ChetanBhasin commented 8 years ago

Please note that the above code works very well if memory is increased (i.e, SBT launched with sbt -J-Xmx4G -J-Xms4G).

MihaiSurdeanu commented 8 years ago

Is this happening because of @marcovzla said: are you on Windows? We now configure sbt through .sbtopts, which seems to be ignored on Windows.

ChetanBhasin commented 8 years ago

@MihaiSurdeanu No, running this on OS X. I'll try it out on CentOS now.

ChetanBhasin commented 8 years ago

@MihaiSurdeanu

Update: Running across same error using CentOS.

MihaiSurdeanu commented 8 years ago

Hmm. @myedibleenso, @marcovzla: any idea what causes this? Can you replicate it?

ChetanBhasin commented 8 years ago

If this cannot be reproduced, we can just close the issue and I can update you if I find any reason for this in future.

Meanwhile, it's possible on my end to just override memory limit using flags and have the applications run.

myedibleenso commented 8 years ago

Hi, @ChetanBhasin. Thanks for pointing this out, and I'm sorry for such a late response.

The odin-examples project appears to be missing an .sbtopts, so the error you reported may have been due to changes in processors (ex. changes to a bionlp dependency such as bioresources) that ended up requiring more memory than previous versions.

I'm going to close this issue and open one in odin-examples for adding .sbtopts.

ChetanBhasin commented 8 years ago

@myedibleenso I see, thanks. We did get it to work by explicitly passing the options. Thanks for looking into this.