Closed jakevossen5 closed 6 years ago
Cool!
What have you tried so far, and what error(s) do you get?
Well, I don't have much yet. I am just trying to recreate your results, and I am getting this in the output
Warning: no model found for 'en'
Only loading the 'en' tokenizer.
Traceback (most recent call last):
File "process.py", line 48, in <module>
corpus = open("corpus.txt", "r").read()
IOError: [Errno 2] No such file or directory: 'corpus.txt'
What was your corpus.txt?
Ah, you need to run mkcorpus.sh
first, to generate this file.
Thanks so much! Sadly, I am getting yet another error (probably due to my own stupidity)
This is what I get
Traceback (most recent call last):
File "process.py", line 50, in <module>
doc = nlp(corpus)
File "/home/jake/.local/lib/python2.7/site-packages/spacy/language.py", line 320, in __call__
doc = self.make_doc(text)
File "/home/jake/.local/lib/python2.7/site-packages/spacy/language.py", line 293, in <lambda>
self.make_doc = lambda text: self.tokenizer(text)
TypeError: Argument 'string' has incorrect type (expected unicode, got str)
Googling it revealed this: https://github.com/explosion/spaCy/issues/212
so I added from __future__ import unicode_literals
Now I get this error:
Traceback (most recent call last):
File "process.py", line 6, in <module>
countries = open("countries.txt", "r").read().split("\n")
UnicodeDecodeError: 'ascii' codec can't decode byte 0xc3 in position 1507: ordinal not in range(128)
What version of python did you use? Also, because I only have a windows machine, this was done on a xubuntu vm, but I don't think that should matter too much. Any ideas?
I used python3.
Other than that, it should run fine on your VM.
I am really interested in modifying your project, I want to do something similar except make it a 3d model for 3dprinting, but I can't seem to figure out corpus. Can you help at all?