Open dbl001 opened 7 years ago
IndexError Traceback (most recent call last)
Seems to work with merge=False:
tokens, vocab = preprocess.tokenize(texts, max_length, n_threads=4, merge=False)
preprocess.py: line 46
I've run into similar issues (or the same issue) where merge=False resolves things, but what impact does that have on the results besides squashing the error?
The merge option seems to merge nouns with other words into single tokens. I don't really think that it affects the shape of topics too much as LDA should be able to handle words by themselves anyway.
I got the same issue. It could be solved by setting the "merge" option to "False".
tokens, vocab = preprocess.tokenize(texts, max_length, n_threads=4,
merge=False) ##!!!!change here into False
Hi I am just trying by giving 'merge=False'! May I know how much time will it take to run the 'tokenize' function?
Cheers Arav
Hi all
After I changed the 'merge = false', it is giving me the following error,
OverflowErrorTraceback (most recent call last)
You need to run python x64 and libs also on x64.
Hi all
After I changed the 'merge = false', it is giving me the following error,
OverflowErrorTraceback (most recent call last) in () 45 texts = features.pop('comment_text').values 46 tokens, vocab = preprocess.tokenize(texts, max_length, n_threads=4, ---> 47 merge=False) 48 del texts 49
/usr/local/lib/python2.7/dist-packages/lda2vec-0.1-py2.7.egg/lda2vec/preprocess.pyc in tokenize(texts, maxlength, skip, attr, merge, nlp, **kwargs) 104 data[row, :length] = dat[:length, 0].ravel() 105 uniques = np.unique(data) --> 106 vocab = {v: nlp.vocab[v].lower for v in uniques if v != skip} 107 vocab[skip] = '' 108 return data, vocab
/usr/local/lib/python2.7/dist-packages/lda2vec-0.1-py2.7.egg/lda2vec/preprocess.pyc in ((v,)) 104 data[row, :length] = dat[:length, 0].ravel() 105 uniques = np.unique(data) --> 106 vocab = {v: nlp.vocab[v].lower_ for v in uniques if v != skip} 107 vocab[skip] = '' 108 return data, vocab
vocab.pyx in spacy.vocab.Vocab.getitem()
OverflowError: can't convert negative value to uint64_t
any heads up on this? kindly help me out with this.
cheers Arav
i'm getting this error too when i try to run preprocess.py , how to fix this ??
Running on OX X 10.11.6 $ python --version Python 2.7.11 :: Anaconda custom (x86_64)
$ python preprocess.py Traceback (most recent call last): File "preprocess.py", line 47, in
merge=True)
File "build/bdist.macosx-10.5-x86_64/egg/lda2vec/preprocess.py", line 78, in tokenize
Chop timestamps into days
File "spacy/tokens/span.pyx", line 65, in spacy.tokens.span.Span.len (spacy/tokens/span.cpp:3955) File "spacy/tokens/span.pyx", line 130, in spacy.tokens.span.Span._recalculate_indices (spacy/tokens/span.cpp:5105) IndexError: Error calculating span: Can't find end
Related to: https://github.com/cemoody/lda2vec/issues/38