DavidBuchanan314 / trumpogram

The World, according to Donald Trump
MIT License
16 stars 0 forks source link

How did you set up corpus? #3

Closed jakevossen5 closed 6 years ago

jakevossen5 commented 7 years ago

I am really interested in modifying your project, I want to do something similar except make it a 3d model for 3dprinting, but I can't seem to figure out corpus. Can you help at all?

DavidBuchanan314 commented 7 years ago

Cool!

What have you tried so far, and what error(s) do you get?

jakevossen5 commented 7 years ago

Well, I don't have much yet. I am just trying to recreate your results, and I am getting this in the output

    Warning: no model found for 'en'

    Only loading the 'en' tokenizer.

Traceback (most recent call last):
  File "process.py", line 48, in <module>
    corpus = open("corpus.txt", "r").read()
IOError: [Errno 2] No such file or directory: 'corpus.txt'

What was your corpus.txt?

DavidBuchanan314 commented 7 years ago

Ah, you need to run mkcorpus.sh first, to generate this file.

jakevossen5 commented 7 years ago

Thanks so much! Sadly, I am getting yet another error (probably due to my own stupidity)

This is what I get

Traceback (most recent call last):
  File "process.py", line 50, in <module>
    doc = nlp(corpus)
  File "/home/jake/.local/lib/python2.7/site-packages/spacy/language.py", line 320, in __call__
    doc = self.make_doc(text)
  File "/home/jake/.local/lib/python2.7/site-packages/spacy/language.py", line 293, in <lambda>
    self.make_doc = lambda text: self.tokenizer(text)
TypeError: Argument 'string' has incorrect type (expected unicode, got str)

Googling it revealed this: https://github.com/explosion/spaCy/issues/212

so I added from __future__ import unicode_literals

Now I get this error:

Traceback (most recent call last):
  File "process.py", line 6, in <module>
    countries = open("countries.txt", "r").read().split("\n")
UnicodeDecodeError: 'ascii' codec can't decode byte 0xc3 in position 1507: ordinal not in range(128)

What version of python did you use? Also, because I only have a windows machine, this was done on a xubuntu vm, but I don't think that should matter too much. Any ideas?

DavidBuchanan314 commented 7 years ago

I used python3.

Other than that, it should run fine on your VM.