Open moissinac opened 8 years ago
Thanks @moissinac for the report. We will investigate the issue.
@moissinac I got this step working on Windows 7, Python 2.7. I got similar errors with old numpy/scipy/scikit-learn versions, but they went away after uninstalling and re-installing with versions from http://www.lfd.uci.edu/~gohlke/pythonlibs/
Using TreeTagger gives me an error on Windows:
python -m strephit commons pos_tag samples/corpus.jsonlines bio en
fails with the error
TypeError: can't pickle thread.lock objects
It seems to be related to the multiprocessing-forking:
File "strephit\commons\pos_tag.py", line 189, in main
for i, tagged_document in enumerate(pos_tagger.tag_many(corpus, document_key, pos_tag_key, batch_size)):
File "strephit\commons\pos_tag.py", line 135, in tag_many
CHUNKERPROC=self._tokenizer_wrapper
File "C:\Run\Python27\lib\site-packages\treetaggerpoll.py", line 207, in __init__
self._build_workers(workerscount, kwargs)
File "C:\Run\Python27\lib\site-packages\treetaggerpoll.py", line 220, in _build_workers
p.start()
File "C:\Run\Python27\lib\multiprocessing\process.py", line 130, in start
self._popen = Popen(self)
File "C:\Run\Python27\lib\multiprocessing\forking.py", line 277, in __init__
dump(process_obj, to_child, HIGHEST_PROTOCOL)
File "C:\Run\Python27\lib\multiprocessing\forking.py", line 199, in dump
ForkingPickler(file, protocol).dump(obj)
File "C:\Run\Python27\lib\pickle.py", line 224, in dump
self.save(obj)
File "C:\Run\Python27\lib\pickle.py", line 331, in save
self.save_reduce(obj=obj, *rv)
File "C:\Run\Python27\lib\pickle.py", line 425, in save_reduce
save(state)
File "C:\Run\Python27\lib\pickle.py", line 286, in save
f(self, obj) # Call unbound method with explicit self
File "C:\Run\Python27\lib\pickle.py", line 655, in save_dict
self._batch_setitems(obj.iteritems())
File "C:\Run\Python27\lib\pickle.py", line 687, in _batch_setitems
save(v)
File "C:\Run\Python27\lib\pickle.py", line 286, in save
f(self, obj) # Call unbound method with explicit self
File "C:\Run\Python27\lib\pickle.py", line 568, in save_tuple
save(element)
File "C:\Run\Python27\lib\pickle.py", line 286, in save
f(self, obj) # Call unbound method with explicit self
File "C:\Run\Python27\lib\pickle.py", line 655, in save_dict
self._batch_setitems(obj.iteritems())
File "C:\Run\Python27\lib\pickle.py", line 687, in _batch_setitems
save(v)
File "C:\Run\Python27\lib\pickle.py", line 286, in save
f(self, obj) # Call unbound method with explicit self
File "C:\Run\Python27\lib\multiprocessing\forking.py", line 67, in dispatcher
self.save_reduce(obj=obj, *rv)
File "C:\Run\Python27\lib\pickle.py", line 401, in save_reduce
save(args)
File "C:\Run\Python27\lib\pickle.py", line 286, in save
f(self, obj) # Call unbound method with explicit self
File "C:\Run\Python27\lib\pickle.py", line 554, in save_tuple
save(element)
File "C:\Run\Python27\lib\pickle.py", line 331, in save
self.save_reduce(obj=obj, *rv)
File "C:\Run\Python27\lib\pickle.py", line 425, in save_reduce
save(state)
File "C:\Run\Python27\lib\pickle.py", line 286, in save
f(self, obj) # Call unbound method with explicit self
File "C:\Run\Python27\lib\pickle.py", line 655, in save_dict
self._batch_setitems(obj.iteritems())
File "C:\Run\Python27\lib\pickle.py", line 687, in _batch_setitems
save(v)
File "C:\Run\Python27\lib\pickle.py", line 331, in save
self.save_reduce(obj=obj, *rv)
File "C:\Run\Python27\lib\pickle.py", line 425, in save_reduce
save(state)
File "C:\Run\Python27\lib\pickle.py", line 286, in save
f(self, obj) # Call unbound method with explicit self
File "C:\Run\Python27\lib\pickle.py", line 655, in save_dict
self._batch_setitems(obj.iteritems())
File "C:\Run\Python27\lib\pickle.py", line 687, in _batch_setitems
save(v)
File "C:\Run\Python27\lib\pickle.py", line 306, in save
rv = reduce(self.proto)
TypeError: can't pickle thread.lock objects
Hello @burki, thank you for the report. The treetaggerwrapper code is not under our control so I could not do much more than catching the exception and writing a for loop; I would use our parallel module but it is not tested under windows so I refrained. This will result in possibly much slower tagging, sorry!
@e-dorigatti Thanks very much, this now works on my Windows 7-machine: [WARNING] pos_tag.tag_many #139 - failed to initialize tree tragger process pool, fallback to single-process tagging [INFO] io.process_stream #38 - Loaded input file 'E:\Playground\StrepHit\samples\corpus.jsonlines' [INFO] pos_tag.main #206 - Done, total tagged items: 19
Hello StrepHit seems very interesting I've installed it on Windows. perl and TreeTagger are working and in the PATH When I execute the following command line python -m strephit extraction process_semistructured -p 1 samples/corpus.jsonlines I get c:\python.exe: Error while finding spec for 'strephit.main' (<class 'ImportError'>: No module named 'annotation'); 'strephit' is a package and cannot be directly executed
Any idea?