attardi / deepnl

Deep Learning for Natural Language Processing
GNU General Public License v3.0
458 stars 116 forks source link

Index error while running dl-sentiwords.py code #38

Closed dmhyun closed 7 years ago

dmhyun commented 8 years ago

Hi, I faced a run-time error while running sentiment specific word embedding code(dl-sentiwords.py).

I tried to run dl-sentiwords.py with some arguments. And The vocabulary and vector files are empty.

./dl-sentiwords.py --vocab VOCAB.txt --vectors Vector.txt data/train.tsv

The error appears as follows:

    Saving vocabulary in VOCAB.txt
    Creating new network...
    ... with the following parameters:

            Input layer size: 350
            Hidden layer size: 20
            Output size: 2

    Starting training
    Traceback (most recent call last):
      File "./dl-sentiwords.py", line 218, in <module>
        args.iterations, report_intervals)
      File "deepnl/sentiwords.pyx", line 301, in deepnl.sentiwords.SentimentTrainer.train (deepnl/sentiwords.cpp:6471)                                  
        File "deepnl/sentiwords.pyx", line 126, in deepnl.sentiwords.SentimentTrainer._train_pair_s  (deepnl/sentiwords.cpp:4235)
        File "deepnl/extractors.pyx", line 153, in deepnl.extractors.Converter.lookup  (deepnl/extractors.cpp:4809)
        File "deepnl/extractors.pyx", line 236, in deepnl.extractors.Extractor.__getitem__(deepnl/extractors.cpp:6880)
      IndexError: index 1209 is out of bounds for axis 0 with size 1209

If I change the number of rows in data file (data/train.tsv), the error is same with above case except last line of error.

    IndexError: index 1210 is out of bounds for axis 0 with size 1210

I think the problem is that some code of training part access the last element of a list or an array with wrong index.

Could you please explain this problem?

Very thanks

attardi commented 7 years ago

Normally this is due to errors in the input data. Check the format of the tsv file.

waniss commented 6 years ago

Hi, Got the same issue, i checked the format of the tsv file, it's good:

1 1 positive j aime les et bien d autres encore yesvegan végétalien URL