borgr / USim

monolingual sentence similarity measure
7 stars 4 forks source link

Default example does not run successfully #1

Closed nmatthews-asapp closed 6 years ago

nmatthews-asapp commented 6 years ago

Below is my output from running the given example:

>>> python USim.py parse out.out -ss "I love rusty spoons", "nothing matters" -rs "he shares pretty cars", "nothing indeed"

/usr/local/lib/python3.6/importlib/_bootstrap.py:205: RuntimeWarning: numpy.dtype size changed, may indicate binary incompatibility. Expected 96, got 88
  return f(*args, **kwds)
Output folder: /.../USim/parse/parse_batch1
creating a new id list for file /.../USim/parse/parse_batch1/sentenceIds.pkl
['i love rusty spoons', 'nothing indeed', 'nothing matters', 'he shares pretty cars']
Parsing 4 sentences. 0 sentences already parsed.
/usr/local/lib/python3.6/importlib/_bootstrap.py:205: RuntimeWarning: numpy.dtype size changed, may indicate binary incompatibility. Expected 96, got 88
  return f(*args, **kwds)
/usr/local/lib/python3.6/importlib/_bootstrap.py:205: RuntimeWarning: numpy.ufunc size changed, may indicate binary incompatibility. Expected 192, got 176
  return f(*args, **kwds)
/usr/local/lib/python3.6/importlib/_bootstrap.py:205: RuntimeWarning: numpy.dtype size changed, may indicate binary incompatibility. Expected 96, got 88
  return f(*args, **kwds)
/usr/local/lib/python3.6/importlib/_bootstrap.py:205: RuntimeWarning: numpy.ufunc size changed, may indicate binary incompatibility. Expected 192, got 176
  return f(*args, **kwds)
Loading spaCy model 'en_core_web_md'... Done (12.958s).

Traceback (most recent call last):
  File "USim.py", line 338, in <module>
    main(args)
  File "USim.py", line 43, in main
    args.source_sentences + args.reference_sentences, args.parse_dir, args.parser_path)
  File "USim.py", line 179, in ucca_parse_sentences
    normalize_sentence, model_path)
  File "USim.py", line 203, in _ucca_parse_text
    parser = get_parser(model_path)
  File "USim.py", line 151, in get_parser
    PARSER = Parser(model_path)
  File "/.../lib/python3.6/site-packages/tupa/parse.py", line 401, in __init__
    models=list(map(Model, (model_files,) if isinstance(model_files, str) else model_files)))
TypeError: 'NoneType' object is not iterable
borgr commented 6 years ago

You are either missing a -p flag or a hard-coded parameter (handy as you probably always use the same model) specifying where is your TUPA model saved. (Note that it is easy to replace the get_parser to have something that is not TUPA but as it is the best current parser in use for UCCA there is no need for that, perhaps after SemEval 2019 UCCA parsing shared task).

If you don't have such a model and don't have the data or need to retrain, you can download a trained model from here (At the time of this answer the direct link is curl -LO https://github.com/huji-nlp/tupa/releases/download/v1.3.3/ucca-bilstm-1.3.3.tar.gz tar xvzf ucca-bilstm-1.3.3.tar.gz)

I updated the documentation a bit to emphasize that.

nmatthews-asapp commented 6 years ago

Thanks for the help! Unfortunately I'm still running into problems after setting the model; the align module doesn't have the functions that are called, according to my error:

Loading from 'models/ucca-bilstm.enum'... Done (0.099s).
Loading model from 'models/ucca-bilstm': 21param [00:53,  2.56s/param]
Loading model from 'models/ucca-bilstm': 100%|██████████████████████████████████████████████████████████████████████████████████████| 21/21 [00:54<00:00,  2.61s/param]
Loading from 'models/ucca-bilstm.nlp.json'.
tupa  --hyperparams "shared --lstm-layers 2" "amr --max-edge-labels 110 --node-label-dim 20 --max-node-labels 1000 --node-category-dim 5 --max-node-categories 25" "sdp --max-edge-labels 70" "conllu --max-edge-labels 60" --log parse.log --max-words 0 --max-words-external 249861 --vocab vocab/en_core_web_lg.csv --word-vectors ../word_vectors/wiki.en.vec
Loading 'vocab/en_core_web_lg.csv': 1344215 rows [00:04, 274232.78 rows/s]
100%|████████████████████████████████████████████████████████████████████████████████████████████████████| 4/4 [00:00<00:00, 12.76 passages/s, en ucca=1_3, |t/s|=1.56]
Parsed 4 passages
Total time: 5.217s (average time/passage: 1.304s, average tokens/s: 2)
Traceback (most recent call last):
  File "USim.py", line 338, in <module>
    main(args)
  File "USim.py", line 46, in main
    r in zip(source_sentences, reference_sentences)]
  File "USim.py", line 46, in <listcomp>
    r in zip(source_sentences, reference_sentences)]
  File "USim.py", line 293, in USim
    if align.regularize_word(source) == "":
AttributeError: module 'align' has no attribute 'regularize_word'
borgr commented 6 years ago

Thank you for being our first beta tester out of the group (sorry...) I have changed the imports to be more convenient and without need of environment variables in or out of the code.

Note: I tried to do everything from scratch, but as tupa's current release has a bug I only got until a certain point with this test. I hope you got a good older release, but the fix should be sometime this week anyway.

borgr commented 6 years ago

New TUPA released, notify if there is something else.