amir-zeldes / HebPipe

An NLP pipeline for Hebrew
Other
34 stars 9 forks source link

IndexError: list index out of range #12

Closed callzhang closed 3 years ago

callzhang commented 3 years ago

When executing this python3 -m hebpipe xxx.mp3.txt command, I have encountered IndexError. Can you please help?

! You selected no processing options
! Assuming you want all processing steps

Running tasks:
====================
o Automatic sentence splitting
o Whitespace tokenization
o Morphological segmentation
o POS tagging
o Lemmatization
o Morphological analysis
o Dependency parsing
o Entity recognition
o Coreference resolution

Processing יאיר לפיד שובר שתיקה באולפן(1).mp3.txt
Traceback (most recent call last):
  File "/opt/homebrew/Cellar/python@3.9/3.9.4/Frameworks/Python.framework/Versions/3.9/lib/python3.9/runpy.py", line 188, in _run_module_as_main
    mod_name, mod_spec, code = _get_module_details(mod_name, _Error)
  File "/opt/homebrew/Cellar/python@3.9/3.9.4/Frameworks/Python.framework/Versions/3.9/lib/python3.9/runpy.py", line 147, in _get_module_details
    return _get_module_details(pkg_main_name, error)
  File "/opt/homebrew/Cellar/python@3.9/3.9.4/Frameworks/Python.framework/Versions/3.9/lib/python3.9/runpy.py", line 111, in _get_module_details
    __import__(pkg_name)
  File "/opt/homebrew/lib/python3.9/site-packages/hebpipe/__init__.py", line 2, in <module>
    run_hebpipe()
  File "/opt/homebrew/lib/python3.9/site-packages/hebpipe/heb_pipe.py", line 681, in run_hebpipe
    processed = nlp(input_text, do_whitespace=opts.whitespace, do_tok=dotok, do_tag=opts.pos, do_lemma=opts.lemma,
  File "/opt/homebrew/lib/python3.9/site-packages/hebpipe/heb_pipe.py", line 516, in nlp
    tagged = inject_col(morphed,tokenized,5)
  File "/opt/homebrew/lib/python3.9/site-packages/hebpipe/lib/append_column.py", line 65, in inject_col
    to_inject = source_cols[col]
IndexError: list index out of range
Elapsed time: 0:00:03.106
callzhang commented 3 years ago

After debugging it turned out that I didn't have java installed. Java is a rare language for machine learning tasks. I suggest the author to add detection on java.(otherwise it will silently return empty string on exec_via_temp function.

amir-zeldes commented 3 years ago

Thanks for pointing this out - in fairness, the README does say:

you will need 64 bit Java installed and available on your path (see details below)

And this is noted in the Requirements section as well. I'd put in the warning in the code, but a better solution is probably to switch over to a native Python based parser, I'll open an issue for that.

callzhang commented 3 years ago

Thanks Amir. You are right. I knew that it requires java. But I didn't know if I have java in my machine. Normally people expect an exception thrown when something is missing. So I did a test run and it didn't raise any flag so I assume that I have java installed. Anyway, I used the nlp function in the code to integrate the process with google ASR service. In order to do that I commented out the code in __init__.py file to be able to import the .py file.