Closed amensiko closed 1 year ago
hi @amensiko can you take a look? I am going through and closing some old PRs and I think I created a conflict. I will take a look myself but if you have time please update :)
I reviewed this PR. It has a host of hard code things (like the download URL) that break back compat, and mostly formatting, ancillary changes that I wouldn't commit. There are a few changes I can actually useful, mainly:
I think that's it. The PR shouldn't change anything in tika.py but those things. I'll try and get to this today. @amensiko @tballison
OK I have a much simpler patch, here:
mattmann@lasagna:~/git/tika-python$ git diff
diff --git a/tika/tika.py b/tika/tika.py
index 04f3202..4f91111 100755
--- a/tika/tika.py
+++ b/tika/tika.py
@@ -172,7 +172,7 @@ TikaFilesPath = tempfile.gettempdir()
TikaServerLogFilePath = log_path
TikaServerJar = os.getenv(
'TIKA_SERVER_JAR',
- "http://search.maven.org/remotecontent?filepath=org/apache/tika/tika-server/"+TikaVersion+"/tika-server-"+TikaVersion+".jar")
+ "http://search.maven.org/remotecontent?filepath=org/apache/tika/tika-server-standard/"+TikaVersion+"/tika-server-standard-"+TikaVersion+".jar")
ServerHost = "localhost"
Port = "9998"
ServerEndpoint = os.getenv(
@@ -648,10 +648,10 @@ def startServer(tikaServerJar, java_path = TikaJava, java_args = TikaJavaArgs, s
# setup command string
cmd_string = ""
if not config_path:
- cmd_string = '%s %s -cp "%s" org.apache.tika.server.TikaServerCli --port %s --host %s &' \
+ cmd_string = '%s %s -cp "%s" org.apache.tika.server.core.TikaServerCli --port %s --host %s &' \
% (java_path, java_args, classpath, port, host)
else:
- cmd_string = '%s %s -cp "%s" org.apache.tika.server.TikaServerCli --port %s --host %s --config %s &' \
+ cmd_string = '%s %s -cp "%s" org.apache.tika.server.core.TikaServerCli --port %s --host %s --config %s &' \
% (java_path, java_args, classpath, port, host, config_path)
# Check that we can write to log path
@@ -688,7 +688,7 @@ def startServer(tikaServerJar, java_path = TikaJava, java_args = TikaJavaArgs, s
while try_count < TikaStartupMaxRetry:
with open(tika_log_file_path, "r") as tika_log_file_tmp:
# check for INFO string to confirm listening endpoint
- if "Started Apache Tika server at" in tika_log_file_tmp.read():
+ if "Started Apache Tika server" in tika_log_file_tmp.read():
That said, two tests are failing (the test_unpack tests). See below:
======================================================================
ERROR: test_ascii (tika.tests.tests_unpack.CreateTest)
----------------------------------------------------------------------
Traceback (most recent call last):
File "/home/mattmann/git/tika-python/tika/tests/tests_unpack.py", line 26, in test_ascii
parsed = unpack.from_file(f.name)
File "/home/mattmann/git/tika-python/tika/unpack.py", line 44, in from_file
return _parse(tarOutput)
File "/home/mattmann/git/tika-python/tika/unpack.py", line 79, in _parse
with _text_wrapper(tarFile.extractfile(metadataMember)) as metadataFile:
AttributeError: __exit__
======================================================================
ERROR: test_utf8 (tika.tests.tests_unpack.CreateTest)
----------------------------------------------------------------------
Traceback (most recent call last):
File "/home/mattmann/git/tika-python/tika/tests/tests_unpack.py", line 18, in test_utf8
parsed = unpack.from_file(f.name)
File "/home/mattmann/git/tika-python/tika/unpack.py", line 44, in from_file
return _parse(tarOutput)
File "/home/mattmann/git/tika-python/tika/unpack.py", line 79, in _parse
with _text_wrapper(tarFile.extractfile(metadataMember)) as metadataFile:
AttributeError: __exit__
----------------------------------------------------------------------
Ran 18 tests in 71.383s
FAILED (errors=2)
Test failed: <unittest.runner.TextTestResult run=18 errors=2 failures=0>
error: Test failed: <unittest.runner.TextTestResult run=18 errors=2 failures=0>
I'll debug these now.
So the unpack errors had nothing to do with this patch they had to do with an older version of python I was testing on (2.7). I have a fix for both 2.7 and 3.7 Python, which I will commit separately. All tests pass now.
This upgrades tika-python to Tika 2.6.0, as per issue #377