Wikidata / StrepHit

An intelligent reading agent that understands text and translates it into Wikidata statements.
https://meta.wikimedia.org/wiki/Grants:IEG/StrepHit:_Wikidata_Statements_Validation_via_References
GNU General Public License v3.0
112 stars 14 forks source link

Problem on windows #47

Open moissinac opened 8 years ago

moissinac commented 8 years ago

Hello StrepHit seems very interesting I've installed it on Windows. perl and TreeTagger are working and in the PATH When I execute the following command line python -m strephit extraction process_semistructured -p 1 samples/corpus.jsonlines I get c:\python.exe: Error while finding spec for 'strephit.main' (<class 'ImportError'>: No module named 'annotation'); 'strephit' is a package and cannot be directly executed

Any idea?

marfox commented 8 years ago

Thanks @moissinac for the report. We will investigate the issue.

burki commented 8 years ago

@moissinac I got this step working on Windows 7, Python 2.7. I got similar errors with old numpy/scipy/scikit-learn versions, but they went away after uninstalling and re-installing with versions from http://www.lfd.uci.edu/~gohlke/pythonlibs/

burki commented 8 years ago

Using TreeTagger gives me an error on Windows:

 python -m strephit commons pos_tag samples/corpus.jsonlines bio en

fails with the error

TypeError: can't pickle thread.lock objects

It seems to be related to the multiprocessing-forking:

  File "strephit\commons\pos_tag.py", line 189, in main
    for i, tagged_document in enumerate(pos_tagger.tag_many(corpus, document_key, pos_tag_key, batch_size)):
  File "strephit\commons\pos_tag.py", line 135, in tag_many
    CHUNKERPROC=self._tokenizer_wrapper
  File "C:\Run\Python27\lib\site-packages\treetaggerpoll.py", line 207, in __init__
    self._build_workers(workerscount, kwargs)
  File "C:\Run\Python27\lib\site-packages\treetaggerpoll.py", line 220, in _build_workers
    p.start()
  File "C:\Run\Python27\lib\multiprocessing\process.py", line 130, in start
    self._popen = Popen(self)
  File "C:\Run\Python27\lib\multiprocessing\forking.py", line 277, in __init__
    dump(process_obj, to_child, HIGHEST_PROTOCOL)
  File "C:\Run\Python27\lib\multiprocessing\forking.py", line 199, in dump
    ForkingPickler(file, protocol).dump(obj)
  File "C:\Run\Python27\lib\pickle.py", line 224, in dump
    self.save(obj)
  File "C:\Run\Python27\lib\pickle.py", line 331, in save
    self.save_reduce(obj=obj, *rv)
  File "C:\Run\Python27\lib\pickle.py", line 425, in save_reduce
    save(state)
  File "C:\Run\Python27\lib\pickle.py", line 286, in save
    f(self, obj) # Call unbound method with explicit self
  File "C:\Run\Python27\lib\pickle.py", line 655, in save_dict
    self._batch_setitems(obj.iteritems())
  File "C:\Run\Python27\lib\pickle.py", line 687, in _batch_setitems
    save(v)
  File "C:\Run\Python27\lib\pickle.py", line 286, in save
    f(self, obj) # Call unbound method with explicit self
  File "C:\Run\Python27\lib\pickle.py", line 568, in save_tuple
    save(element)
  File "C:\Run\Python27\lib\pickle.py", line 286, in save
    f(self, obj) # Call unbound method with explicit self
  File "C:\Run\Python27\lib\pickle.py", line 655, in save_dict
    self._batch_setitems(obj.iteritems())
  File "C:\Run\Python27\lib\pickle.py", line 687, in _batch_setitems
    save(v)
  File "C:\Run\Python27\lib\pickle.py", line 286, in save
    f(self, obj) # Call unbound method with explicit self
  File "C:\Run\Python27\lib\multiprocessing\forking.py", line 67, in dispatcher
    self.save_reduce(obj=obj, *rv)
  File "C:\Run\Python27\lib\pickle.py", line 401, in save_reduce
    save(args)
  File "C:\Run\Python27\lib\pickle.py", line 286, in save
    f(self, obj) # Call unbound method with explicit self
  File "C:\Run\Python27\lib\pickle.py", line 554, in save_tuple
    save(element)
  File "C:\Run\Python27\lib\pickle.py", line 331, in save
    self.save_reduce(obj=obj, *rv)
  File "C:\Run\Python27\lib\pickle.py", line 425, in save_reduce
    save(state)
  File "C:\Run\Python27\lib\pickle.py", line 286, in save
    f(self, obj) # Call unbound method with explicit self
  File "C:\Run\Python27\lib\pickle.py", line 655, in save_dict
    self._batch_setitems(obj.iteritems())
  File "C:\Run\Python27\lib\pickle.py", line 687, in _batch_setitems
    save(v)
  File "C:\Run\Python27\lib\pickle.py", line 331, in save
    self.save_reduce(obj=obj, *rv)
  File "C:\Run\Python27\lib\pickle.py", line 425, in save_reduce
    save(state)
  File "C:\Run\Python27\lib\pickle.py", line 286, in save
    f(self, obj) # Call unbound method with explicit self
  File "C:\Run\Python27\lib\pickle.py", line 655, in save_dict
    self._batch_setitems(obj.iteritems())
  File "C:\Run\Python27\lib\pickle.py", line 687, in _batch_setitems
    save(v)
  File "C:\Run\Python27\lib\pickle.py", line 306, in save
    rv = reduce(self.proto)
TypeError: can't pickle thread.lock objects
e-dorigatti commented 8 years ago

Hello @burki, thank you for the report. The treetaggerwrapper code is not under our control so I could not do much more than catching the exception and writing a for loop; I would use our parallel module but it is not tested under windows so I refrained. This will result in possibly much slower tagging, sorry!

burki commented 8 years ago

@e-dorigatti Thanks very much, this now works on my Windows 7-machine: [WARNING] pos_tag.tag_many #139 - failed to initialize tree tragger process pool, fallback to single-process tagging [INFO] io.process_stream #38 - Loaded input file 'E:\Playground\StrepHit\samples\corpus.jsonlines' [INFO] pos_tag.main #206 - Done, total tagged items: 19