tokenize error - Githubissues

longma307 commented 8 years ago

I have been following your instruction to test lda2vec, but I got an error when I tried to run this line: tokens, vocab = preprocess.tokenize(texts,max_length,tag=False,parse=False,entity=False)

runfile('/Users/lm/Dropbox/Athena/Feature_Reduction/WordVectors/lda2vec_test.py', wdir='/Users/m/Dropbox/Athena/Feature_Reduction/WordVectors') Traceback (most recent call last):

File "", line 1, in runfile('/Users/lm/Dropbox/Athena/Feature_Reduction/WordVectors/lda2vec_test.py', wdir='/Users/lm/Dropbox/Athena/Feature_Reduction/WordVectors')

File "/Users/lm/Documents/anaconda/lib/python2.7/site-packages/spyderlib/widgets/externalshell/sitecustomize.py", line 699, in runfile execfile(filename, namespace)

File "/Users/lm/Documents/anaconda/lib/python2.7/site-packages/spyderlib/widgets/externalshell/sitecustomize.py", line 81, in execfile builtins.execfile(filename, *where)

File "/Users/lm/Dropbox/Athena/Feature_Reduction/WordVectors/lda2vec_test.py", line 29, in tokens, vocab = preprocess.tokenize(texts,max_length,tag=False,parse=False,entity=False)

File "build/bdist.macosx-10.5-x86_64/egg/lda2vec/preprocess.py", line 65, in tokenize nlp = English(data_dir=data_dir)

File "/Users/lm/Documents/anaconda/lib/python2.7/site-packages/spacy/language.py", line 210, in init vocab = self.default_vocab(package)

File "/Users/lm/Documents/anaconda/lib/python2.7/site-packages/spacy/language.py", line 144, in default_vocab return Vocab.from_package(package, get_lex_attr=get_lex_attr)

File "spacy/vocab.pyx", line 65, in spacy.vocab.Vocab.frompackage (spacy/vocab.cpp:3592) with package.open(('vocab', 'strings.json')) as file:

File "/Users/lm/Documents/anaconda/lib/python2.7/contextlib.py", line 17, in enter return self.gen.next()

File "/Users/lm/Documents/anaconda/lib/python2.7/site-packages/sputnik/package_stub.py", line 68, in open raise default(self.file_path(*path_parts))

IOError: /Users/lm/Documents/anaconda/lib/python2.7/site-packages/spacy/en/data/vocab/strings.json

I have my related modules(numpy,space...) updated to the newest, I still got this error.

jcquarto commented 8 years ago

I am experiencing this problem as well, exactly the same

cemoody commented 8 years ago

This seems like a spacy error -- have y'all tried downloading the vocab files that accompany spacy?

python -m spacy.en.download ?

longma307 commented 8 years ago

@cemoody Yes, I did and I also upgraded the required module (numpy, spacy...) to the newest one, but this error is still exist.

cemoody commented 8 years ago

So it looks like it's an issue with SpaCy:

https://github.com/spacy-io/spaCy/issues/183

https://github.com/spacy-io/spaCy/issues/155

...I can't reproduce this, so it's tough for me to debug. All I can really do is what Honnibal is suggesting -- try adding the --force flag by doing python -m spacy.en.download --force all?

longma307 commented 8 years ago

@cemoody That works for me finally, thanks for help.

cemoody commented 8 years ago

@longma307 Glad it helped! :)

cemoody / lda2vec

tokenize error #8