Closed Balachandar-R closed 6 years ago
hi @Balachandar-R thanks for your report. What is the proxy URL you are trying to access. I see that you are getting a 403 - does your proxy require credentials?
Hi @chrismattmann ,
Thanks for your quick response,
Proxy URL is : export http_proxy="http://172.27.66.50:9400" export https_proxy="http://172.27.66.50:9400"
Thanks, Balachandar
Does it require credentials?
No, Tika server got started and 9998 is open and we could LISTEN (127.0.0.1:9998) via the command netstat -na | grep 9998.
The request reached the proxy sever and response we will not get it back.
can you add a URL parameter to parser.from_file('/path/to/file', 'http://172.27.66.50:9400') and try that? (all methods at the interface level take an optional parameter for a diff Tika server to contact). @Balachandar-R
@chrismattmann
I m getting the following error when i tried with URL parameter.
p = parser.from_file('/home/yell/sentence_success.txt','http://172.27.66.50:9400') 2017-09-25 03:47:55,617 [MainThread ] [WARNI] Tika server returned status: 504 Traceback (most recent call last): File "
", line 1, in File "/usr/local/lib/python2.7/dist-packages/tika/parser.py", line 37, in from_file return _parse(jsonOutput) File "/usr/local/lib/python2.7/dist-packages/tika/parser.py", line 69, in _parse realJson = json.loads(jsonOutput[1]) File "/usr/lib/python2.7/json/init.py", line 339, in loads return _default_decoder.decode(s) File "/usr/lib/python2.7/json/decoder.py", line 364, in decode obj, end = self.raw_decode(s, idx=_w(s, 0).end()) File "/usr/lib/python2.7/json/decoder.py", line 382, in raw_decode raise ValueError("No JSON object could be decoded") ValueError: No JSON object could be decoded
But without proxy it works fine.
Please tell me any other suggestions?
Thanks Balachandar-R
mmm looks like you are getting an HTTP 504 error, which corresponds to:
10.5.5 504 Gateway Timeout. The server, while acting as a gateway or proxy, did not receive a
timely response from the upstream server specified by the URI (e.g. HTTP, FTP, LDAP) or some
other auxiliary server (e.g. DNS) it needed to access in attempting to complete the request.
(proxy configuration issue?)
Hi @chrismattmann ,
This issue got resolved by setting no_proxy="proxy-address" in code level. Thanks for your instant replies @chrismattmann
Then one more doubt in the tika server.is there any restriction on the total no of files for extraction? For some specific excel files tika got failed to extract the content with 403 status.
Any comments on the above?
Thanks Balachandar-R
HI @Balachandar-R ,
I have the same issue when i try to use tika python via proxy for file. I read your explanation but could you give me more detail about solution applied ?
Thanks, KramFox
I found the tika package on my local computer after installing via pip and I manually edited the following lines ( all 4 of them) in tika.py :
urlretrieve(urlOrPath, destPath)
to:
import urllib
proxy = urllib.request.ProxyHandler({'http': '...','https': '...'})
opener = urllib.request.build_opener(proxy)
urllib.request.install_opener(opener) urllib.request.urlretrieve(urlOrPath, destPath)
and it worked !!
hi @omidbadr if you get a chance, consider sending me a PR by making the above an optional configuration?
@omidbadr Thank you mate, your trick worked like a charm.
@chrismattmann - I was getting "HTTPError: HTTP Error 407: Proxy Authentication Required" error but @omidbadr 's solution came to rescue. But still trying to understand the root cause, can you help?
not sure about the root cause, likely buried in the requests lib
Also can someone send me a PR @hubgitadi @omidbadr so that we can make the above an optional config with docs?
hye @chrismattmann
I'm getting warning: Tika server returned status: 403 and JSONDecodeError...while using it in the unix terminal... how can i solve this issue?
make sure that your tika server started, and that you have Java installed.
Error 403, able to resolve by configuring useragent in python requests module. Can you tell how can I pass user agent in tika
@chrismattmann shall I make the PR for this? that is to make Tika work properly while accessing via proxy?
sure I'll take a look @ashish735
Hi Team,
I have installed the python-tika-1.14 in Linux (Ubuntu 16.04) box running on cloud. While executing this code
I didn't any error when i have an open access to the Linux Instance.
When i tried the same code under via proxy causing an issue as follows.
Pls help me on how to set the proxy in python-tika and where to configure this.
Thanks Balachandar