TurkuNLP / Turku-neural-parser-pipeline

A neural parsing pipeline for segmentation, morphological tagging, dependency parsing and lemmatization with pre-trained models for more than 50 languages. Top ranker in the CoNLL-18 Shared Task.
https://turkunlp.github.io/Turku-neural-parser-pipeline/
Apache License 2.0
111 stars 31 forks source link

Usage of built-in name parser causes issues #4

Closed vijoc closed 6 years ago

vijoc commented 6 years ago

I'm doing some research on a volume of Finnish text, and was trying to set up this amazing looking tool. However, I quickly ran into issues due to having parser as a built-in module on my Python 3.6 installation on Windows 7.

This manifests as not being able to import the modules from the correct parser package provided by this pipeline. I'm currently looking into resolving the issue by the only option I'm aware of: renaming the parser package.

As it's quite possible others may run into the same issue, would this make a welcome PR back to the repository? Would you have a specific substitute name for the package in mind?

fginter commented 6 years ago

Wouldn't this https://github.com/TurkuNLP/Turku-neural-parser-pipeline/blob/master/parser_mod.py#L8 take care of the problem?

vijoc commented 6 years ago

It does not appear to help with modules that are directly included in the interpreter, as at least https://github.com/robotframework/robotframework/issues/2541 appears to conclude.

I was trying to run the pipeline but could not get it working as the built-in module was always imported rather than the one provided here. After renaming the package (directory) I could get past that issue, but I did not yet have time to finish the change.

fginter commented 6 years ago

...that's really weird, we are running this code on Python 3.5 just fine and I am not aware of any major change between 3.6 and 3.5 in the way modules are imported. Do you have a little snippet to share, which shows the failure? I need to understand how you are running the parser.

fginter commented 6 years ago

The change would need to happen here https://github.com/fginter/Parser-v2/tree/c61b154aeec22a3f7e27295f95ee3a73f0386676 right?

vijoc commented 6 years ago

I'm not super familiar with how the Python interpreters are compiled, but I imagine there may be differences between versions in what modules are built-in. As noted in the referenced thread, the names are programmatically available, though.

I'm not at my computer right now, but I'll gather some more information tomorrow.

vijoc commented 6 years ago

Yes, renaming the directory from parser gets past the initial issue, but then there is the remaining issue of updating all the places where the module name is referenced. Largely a search/replace -type thing, I'd imagine.

fginter commented 6 years ago

OK. Let's look into this. I don't have a problem renaming parser but must make sure nothing breaks at any point. Could also think whether init.py could define other name, since the Parser-v2 repo is forked, so making a massive change like renaming the parent directory of the module could cause downstream problems.

vijoc commented 6 years ago

I'm not sure if changes in the init-file would help, as I imagine that with built-in modules the import statement avoids the regular search of the module, i.e. the file is never even looked at

fginter commented 6 years ago

probably right - if only I could wrap my mind around why it works fine for me... what's the OS and python interpreter you use?

vijoc commented 6 years ago

Windows 7 and Python 3.6.6. I imagine there may be differences in what is built-in with the Windows interpreter as opposed to other OS's.

vijoc commented 6 years ago

OK @fginter I gathered an example of how the importing fails when running (really anything) with the pipeline. Here, the system does not even reach the point of evaluating any input, and as such it could be omitted.

(venv-workspace) C:\Users\VKu\Source\Turku-neural-parser-pipeline>python --versi
on
Python 3.6.6

(venv-workspace) C:\Users\VKu\Source\Turku-neural-parser-pipeline>python full_pi
peline_stream.py --conf models_fi_tdt/pipelines.yaml list
['parse_conllu', 'parse_plaintext', 'parse_sentlines', 'parse_wslines', 'wipepar
se_conllu']

(venv-workspace) C:\Users\VKu\Source\Turku-neural-parser-pipeline>python full_pi
peline_stream.py --conf models_fi_tdt/pipelines.yaml --pipeline parse_plaintext
Traceback (most recent call last):
  File "full_pipeline_stream.py", line 80, in <module>
    p=Pipeline(steps=pipeline)
  File "C:\Users\VKu\Source\Turku-neural-parser-pipeline\pipeline.py", line 21,
in __init__
    self.add_step(mod_name_and_params)
  File "C:\Users\VKu\Source\Turku-neural-parser-pipeline\pipeline.py", line 37,
in add_step
    mod=importlib.import_module(module_name)
  File "C:\Users\VKu\AppData\Local\Programs\Python\Python36\lib\importlib\__init
__.py", line 126, in import_module
    return _bootstrap._gcd_import(name[level:], package, level)
  File "<frozen importlib._bootstrap>", line 994, in _gcd_import
  File "<frozen importlib._bootstrap>", line 971, in _find_and_load
  File "<frozen importlib._bootstrap>", line 955, in _find_and_load_unlocked
  File "<frozen importlib._bootstrap>", line 665, in _load_unlocked
  File "<frozen importlib._bootstrap_external>", line 678, in exec_module
  File "<frozen importlib._bootstrap>", line 219, in _call_with_frames_removed
  File "C:\Users\VKu\Source\Turku-neural-parser-pipeline\parser_mod.py", line 4,
 in <module>
    import parser_lib
  File "C:\Users\VKu\Source\Turku-neural-parser-pipeline\parser_lib.py", line 30
, in <module>
    from parser import Configurable
ImportError: cannot import name 'Configurable'

Below is an example from interactive shell showing that parser is a built-in module.

Python 3.6.6 (v3.6.6:4cf1f54eb7, Jun 27 2018, 03:37:03) [MSC v.1900 64 bit (AMD6
4)] on win32
Type "help", "copyright", "credits" or "license" for more information.
>>> import sys
>>> sys.builtin_module_names
('_ast', '_bisect', '_blake2', '_codecs', '_codecs_cn', '_codecs_hk', '_codecs_i
so2022', '_codecs_jp', '_codecs_kr', '_codecs_tw', '_collections', '_csv', '_dat
etime', '_functools', '_heapq', '_imp', '_io', '_json', '_locale', '_lsprof', '_
md5', '_multibytecodec', '_opcode', '_operator', '_pickle', '_random', '_sha1',
'_sha256', '_sha3', '_sha512', '_signal', '_sre', '_stat', '_string', '_struct',
 '_symtable', '_thread', '_tracemalloc', '_warnings', '_weakref', '_winapi', 'ar
ray', 'atexit', 'audioop', 'binascii', 'builtins', 'cmath', 'errno', 'faulthandl
er', 'gc', 'itertools', 'marshal', 'math', 'mmap', 'msvcrt', 'nt', 'parser', 'sy
s', 'time', 'winreg', 'xxsubtype', 'zipimport', 'zlib')

After changing the directory name of Parser-v2/parser to Parser-v2/nparser and updating the two import statements in parser_lib.py accordingly, the module is found and the error is kicked down the road to the absolute import paths inside the parser module.

(venv-workspace) C:\Users\VKu\Source\Turku-neural-parser-pipeline>python full_pi
peline_stream.py --conf models_fi_tdt/pipelines.yaml --pipeline parse_plaintext
Traceback (most recent call last):
  File "full_pipeline_stream.py", line 80, in <module>
    p=Pipeline(steps=pipeline)
  File "C:\Users\VKu\Source\Turku-neural-parser-pipeline\pipeline.py", line 21,
in __init__
    self.add_step(mod_name_and_params)
  File "C:\Users\VKu\Source\Turku-neural-parser-pipeline\pipeline.py", line 37,
in add_step
    mod=importlib.import_module(module_name)
  File "C:\Users\VKu\AppData\Local\Programs\Python\Python36\lib\importlib\__init
__.py", line 126, in import_module
    return _bootstrap._gcd_import(name[level:], package, level)
  File "<frozen importlib._bootstrap>", line 994, in _gcd_import
  File "<frozen importlib._bootstrap>", line 971, in _find_and_load
  File "<frozen importlib._bootstrap>", line 955, in _find_and_load_unlocked
  File "<frozen importlib._bootstrap>", line 665, in _load_unlocked
  File "<frozen importlib._bootstrap_external>", line 678, in exec_module
  File "<frozen importlib._bootstrap>", line 219, in _call_with_frames_removed
  File "C:\Users\VKu\Source\Turku-neural-parser-pipeline\parser_mod.py", line 4,
 in <module>
    import parser_lib
  File "C:\Users\VKu\Source\Turku-neural-parser-pipeline\parser_lib.py", line 30
, in <module>
    from nparser import Configurable
  File "C:\Users\VKu\Source\Turku-neural-parser-pipeline\Parser-v2\nparser\__ini
t__.py", line 5, in <module>
    from .bucket import Bucket
  File "C:\Users\VKu\Source\Turku-neural-parser-pipeline\Parser-v2\nparser\bucke
t.py", line 25, in <module>
    from parser.configurable import Configurable
ModuleNotFoundError: No module named 'parser.configurable'; 'parser' is not a pa
ckage

In summary, I do think that renaming the package is the only solution for this specific problem. If you could check whether in your environment the result of calling sys.builtin_module_names includes parser, that could prove the difference between our interpreters.

fginter commented 6 years ago

Yes, I can confirm parser is not among builtin modules on linux. So I guess parser->nparser it then is. :)

vijoc commented 6 years ago

Resolved by https://github.com/TurkuNLP/Turku-neural-parser-pipeline/pull/7