chrismattmann / tika-python

Tika-Python is a Python binding to the Apache Tika™ REST services allowing Tika to be called natively in the Python community.
Apache License 2.0
1.51k stars 235 forks source link

ValueError: No JSON object could be decoded #114

Closed harsham05 closed 8 years ago

harsham05 commented 8 years ago
parsed = parser.from_file("2015.new.tsv")
tika.py: Warn: Tika server returned status: 500
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/Users/hmanjunatha/anaconda2/lib/python2.7/site-packages/tika/parser.py", line 28, in from_file
    return _parse(jsonOutput)
  File "/Users/hmanjunatha/anaconda2/lib/python2.7/site-packages/tika/parser.py", line 47, in _parse
    realJson = json.loads(jsonOutput[1])
  File "/Users/hmanjunatha/anaconda2/lib/python2.7/json/__init__.py", line 339, in loads
    return _default_decoder.decode(s)
  File "/Users/hmanjunatha/anaconda2/lib/python2.7/json/decoder.py", line 364, in decode
    obj, end = self.raw_decode(s, idx=_w(s, 0).end())
  File "/Users/hmanjunatha/anaconda2/lib/python2.7/json/decoder.py", line 382, in raw_decode
    raise ValueError("No JSON object could be decoded")
ValueError: No JSON object could be decoded

@chrismattmann I might have a fix for this. Will send PR soon.

harsham05 commented 8 years ago

Looking at the tika-server.log I guess the file was too big.

Jun 22, 2016 6:38:40 PM org.slf4j.impl.JCLLoggerAdapter warn
WARNING: /rmeta/text
java.lang.RuntimeException: org.apache.cxf.interceptor.Fault: Java heap space
        at org.apache.cxf.interceptor.AbstractFaultChainInitiatorObserver.onMessage(AbstractFaultChainInitiatorObserver.java:116)
        at org.apache.cxf.phase.PhaseInterceptorChain.doIntercept(PhaseInterceptorChain.java:371)
        at org.apache.cxf.transport.ChainInitiationObserver.onMessage(ChainInitiationObserver.java:121)
        at org.apache.cxf.transport.http.AbstractHTTPDestination.invoke(AbstractHTTPDestination.java:251)
        at org.apache.cxf.transport.http_jetty.JettyHTTPDestination.doService(JettyHTTPDestination.java:261)
        at org.apache.cxf.transport.http_jetty.JettyHTTPHandler.handle(JettyHTTPHandler.java:70)
        at org.eclipse.jetty.server.handler.ContextHandler.doHandle(ContextHandler.java:1088)
        at org.eclipse.jetty.server.handler.ContextHandler.doScope(ContextHandler.java:1024)
        at org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:135)
        at org.eclipse.jetty.server.handler.ContextHandlerCollection.handle(ContextHandlerCollection.java:255)
        at org.eclipse.jetty.server.handler.HandlerWrapper.handle(HandlerWrapper.java:116)
        at org.eclipse.jetty.server.Server.handle(Server.java:370)
        at org.eclipse.jetty.server.AbstractHttpConnection.handleRequest(AbstractHttpConnection.java:494)
        at org.eclipse.jetty.server.AbstractHttpConnection.content(AbstractHttpConnection.java:982)
        at org.eclipse.jetty.server.AbstractHttpConnection$RequestHandler.content(AbstractHttpConnection.java:1043)
        at org.eclipse.jetty.http.HttpParser.parseNext(HttpParser.java:865)
        at org.eclipse.jetty.http.HttpParser.parseAvailable(HttpParser.java:240)
        at org.eclipse.jetty.server.AsyncHttpConnection.handle(AsyncHttpConnection.java:82)
        at org.eclipse.jetty.io.nio.SelectChannelEndPoint.handle(SelectChannelEndPoint.java:696)
        at org.eclipse.jetty.io.nio.SelectChannelEndPoint$1.run(SelectChannelEndPoint.java:53)
        at org.eclipse.jetty.util.thread.QueuedThreadPool.runJob(QueuedThreadPool.java:608)
        at org.eclipse.jetty.util.thread.QueuedThreadPool$3.run(QueuedThreadPool.java:543)
        at java.lang.Thread.run(Thread.java:745)
Caused by: org.apache.cxf.interceptor.Fault: Java heap space
        at org.apache.cxf.service.invoker.AbstractInvoker.createFault(AbstractInvoker.java:163)
        at org.apache.cxf.service.invoker.AbstractInvoker.invoke(AbstractInvoker.java:129)
        at org.apache.cxf.jaxrs.JAXRSInvoker.invoke(JAXRSInvoker.java:200)
        at org.apache.cxf.jaxrs.JAXRSInvoker.invoke(JAXRSInvoker.java:99)
        at org.apache.cxf.interceptor.ServiceInvokerInterceptor$1.run(ServiceInvokerInterceptor.java:59)
        at org.apache.cxf.interceptor.ServiceInvokerInterceptor.handleMessage(ServiceInvokerInterceptor.java:96)
        at org.apache.cxf.phase.PhaseInterceptorChain.doIntercept(PhaseInterceptorChain.java:307)
        ... 21 more
Caused by: java.lang.OutOfMemoryError: Java heap space
chrismattmann commented 8 years ago

@harsham05 if the file was too big yeah that looks like a tika-server issue.

chrismattmann commented 8 years ago

can you file this issue in tika server?

harsham05 commented 8 years ago

Sure thing @chrismattmann https://issues.apache.org/jira/browse/TIKA-2017

zixuzhang commented 7 years ago

maybe tika-server.jar is wrong.