FraBle / python-sutime

Python wrapper for Stanford CoreNLP's SUTime
GNU General Public License v3.0
154 stars 43 forks source link

Sutime.parse creates fatal error when called as a function from Flask #6

Closed Jyoti1009 closed 7 years ago

Jyoti1009 commented 7 years ago

I have written a script which parses a sentence and works perfectly fine on command line. However, when I use the same function inside a flask app, it crashes on the line sutime.parse(). Here is the screenshot: image

Kindly help me solve this!

FraBle commented 7 years ago

Can you give some more context (/flask code)?

Jyoti1009 commented 7 years ago

Here is the flask code:

Route request

@app.route("/")

def default():

    if 'sentence' in request.args and request.args.get('sentence', '') != "" and request.args.get('getdate', '') == "True":

    text = request.args.get('sentence', '')

    return jsonify({"result": getdate.getDate(text)})

And here is the getdate.getDate(text) function: (The function is after the lines of code to load the jar files)

Function in getdate.py

def getDate(test_case):

    date_obj = "today"

    test_case = re.sub(r"[,-.;@#?!&$]+\ *", " ", test_case)

    result = sutime.parse(test_case)

    return result
Jyoti1009 commented 7 years ago

@FraBle Hi Again, I have been trying to diagnose the problem and one guess that I have is probably the JRE is crashing due to memory error. This would mean that probably flask is responsible for memory allocation to the JVM which is less than the required memory as the code runs fine out of flask. I tried different measures to increase the memory size of the JVM but it did not help. Could you please provide a resolution that you feel might work here as I am really stuck at this point. Thanks in advance.

FraBle commented 7 years ago

The following code is running fine for me:

import os
import re
from flask import Flask, request
from flask.json import jsonify
from sutime import SUTime

app = Flask(__name__)
_jar_files = os.path.join(os.path.dirname(__file__), 'jars')
SUTIME = SUTime(jars=_jar_files, mark_time_ranges=True)

def get_date(text):
    text = re.sub(r'[,-.;@#?!&$]+\ *', ' ', text)
    result = SUTIME.parse(text)
    return result

@app.route("/")
def default():
    if request.args.get('sentence', '') and request.args.get('getdate', '') == 'True':
        text = request.args.get('sentence', '')
        return jsonify({'result': get_date(text)})
    else:
        return jsonify({'error': 'necessary parameters missing'})

if __name__ == '__main__':
    app.run()

Can you give me more context on your environment and system? I'm running the test with Python 2.7.13, Java 1.8.0_121, MacBook Pro (15-inch, 2016), 16GB RAM, macOS Sierra

FraBle commented 7 years ago

The following code is kinda ugly but simulates memory restrictions:

import os
import re
import threading
import socket
from flask import Flask, request
from flask.json import jsonify
import jpype
from sutime import SUTime

socket.setdefaulttimeout(15)

def create_classpath(path):
    jars = []
    for top, dirs, files in os.walk(path):
        for file_name in files:
            if file_name.endswith('.jar'):
                jars.append(os.path.join(top, file_name))
    return os.pathsep.join(jars)

app = Flask(__name__)
LOCK = threading.Lock()
JAR_FILES = os.path.join(os.path.dirname(__file__), 'jars')
CLASSPATH = create_classpath(JAR_FILES)
MINIMUM_HEAP_SIZE='128m'
MAXIMUM_HEAP_SIZE='512m'

def start_jvm(minimum_heap_size, maximum_heap_size):
    jvm_options = [
        '-Xms{minimum_heap_size}'.format(minimum_heap_size=minimum_heap_size),
        '-Xmx{maximum_heap_size}'.format(maximum_heap_size=maximum_heap_size),
        '-Djava.class.path={classpath}'.format(
            classpath=CLASSPATH)
    ]
    if jpype.isJVMStarted() is not 1:
        print('starting JVM')
        print(jpype.getDefaultJVMPath())
        print(jvm_options)
        jpype.startJVM(
            jpype.getDefaultJVMPath(),
            *jvm_options
        )

def start_sutime(minimum_heap_size, maximum_heap_size):
    start_jvm(minimum_heap_size, maximum_heap_size)
    try:
        if (threading.activeCount() > 1 and
                jpype.isThreadAttachedToJVM() is not 1):
            jpype.attachThreadToJVM()
        LOCK.acquire()
    finally:
        LOCK.release()

start_sutime(MINIMUM_HEAP_SIZE, MAXIMUM_HEAP_SIZE)
SUTIME = SUTime(jvm_started=True, mark_time_ranges=True)

def get_date(text):
    text = re.sub(r'[,-.;@#?!&$]+\ *', ' ', text)
    result = SUTIME.parse(text)
    return result

@app.route("/")
def default():
    if request.args.get('sentence', '') and request.args.get('getdate', '') == 'True':
        text = request.args.get('sentence', '')
        return jsonify({'result': get_date(text)})
    else:
        return jsonify({'error': 'necessary parameters missing'})

if __name__ == '__main__':
    app.run()

Which results in the following stack trace:

starting JVM
/Library/Java/JavaVirtualMachines/jdk1.8.0_121.jdk/Contents/Home/jre/lib/jli/libjli.dylib
['-Xms128m', '-Xmx512m', '-Djava.class.path=jars/ejml-0.23.jar:jars/gson-2.7.jar:jars/javax.json-api-1.0.jar:jars/jaxb-api-2.2.7.jar:jars/joda-time-2.9.jar:jars/jollyday-0.4.7.jar:jars/slf4j-api-1.7.12.jar:jars/slf4j-simple-1.7.21.jar:jars/stanford-corenlp-3.6.0-models.jar:jars/stanford-corenlp-3.6.0.jar:jars/stanford-corenlp-sutime-python-1.0.0.jar:jars/xalan-2.7.0.jar:jars/xercesImpl-2.8.0.jar:jars/xml-apis-1.3.03.jar:jars/xom-1.2.10.jar']
[main] INFO edu.stanford.nlp.pipeline.StanfordCoreNLP - Registering annotator sutime with class edu.stanford.nlp.time.TimeAnnotator
[main] INFO edu.stanford.nlp.pipeline.StanfordCoreNLP - Adding annotator tokenize
[main] INFO edu.stanford.nlp.pipeline.TokenizerAnnotator - TokenizerAnnotator: No tokenizer type provided. Defaulting to PTBTokenizer.
[main] INFO edu.stanford.nlp.pipeline.StanfordCoreNLP - Adding annotator ssplit
[main] INFO edu.stanford.nlp.pipeline.StanfordCoreNLP - Adding annotator pos
Reading POS tagger model from edu/stanford/nlp/models/pos-tagger/english-left3words/english-left3words-distsim.tagger ... done [0.9 sec].
[main] INFO edu.stanford.nlp.pipeline.StanfordCoreNLP - Adding annotator lemma
[main] INFO edu.stanford.nlp.pipeline.StanfordCoreNLP - Adding annotator ner
sutime.includeRange=false
Unknown property: |sutime.includeRange|
sutime.markTimeRanges=true
Unknown property: |sutime.markTimeRanges|
Loading classifier from edu/stanford/nlp/models/ner/english.all.3class.distsim.crf.ser.gz ... Unknown property: |sutime.includeRange|
Unknown property: |sutime.markTimeRanges|
done [1.6 sec].
Loading classifier from edu/stanford/nlp/models/ner/english.muc.7class.distsim.crf.ser.gz ... Unknown property: |sutime.includeRange|
Unknown property: |sutime.markTimeRanges|
done [2.0 sec].
Loading classifier from edu/stanford/nlp/models/ner/english.conll.4class.distsim.crf.ser.gz ... Traceback (most recent call last):
  File "app.py", line 57, in <module>
    SUTIME = SUTime(jvm_started=True, mark_time_ranges=True)
  File "/Users/frank/.virtualenvs/sutime-flask/lib/python2.7/site-packages/sutime/sutime.py", line 59, in __init__
    self.mark_time_ranges, self.include_range)
  File "/Users/frank/.virtualenvs/sutime-flask/lib/python2.7/site-packages/jpype/_jclass.py", line 86, in _javaInit
    *args)
jpype._jexception.OutOfMemoryErrorPyRaisable: java.lang.OutOfMemoryError: GC overhead limit exceeded

According to your screenshot, you're running on Ubuntu 16.04 with OpenJDK on 64bit. Could you try to run it with Oracle Java 8?

Jyoti1009 commented 7 years ago

Sure, let me try and get back! Thanks a lot for the response.

Jyoti1009 commented 7 years ago

@FraBle I found the source of the problem. The issue is with the debug mode. If I turn on the debug mode in Flask, it throws the above error as I reported. Else everything works fine. I am removing the debug mode for now to continue with my task. Let me know if you would like to investigate further or if I should close the issue.

FraBle commented 7 years ago

It might be related to https://github.com/originell/jpype/issues/211 Restarts also don't seem to be supported by JVM: https://github.com/originell/jpype/issues/84#issuecomment-157680233 It could also be the socket.setdefaulttimeout(15) https://github.com/xlcnd/isbnlib/issues/43#issuecomment-226468157, but removing it didn't change anything :(

Jyoti1009 commented 7 years ago

I will then keep an eye on the above issues in that case. Thanks for all the help. Closing the issue here now. :)

FraBle commented 7 years ago

You're welcome! :)