ahoho / SentiVAE

MIT License
16 stars 3 forks source link

Data ingest issue #1

Closed stevenbedrick closed 3 years ago

stevenbedrick commented 3 years ago

Hello! I am attempting to run vae.py, and have been working on downloading all of the requisite datasets. I am stuck on the Hu & Liu 2004 dataset- from the linked website, I have obtained what I believe to be the right dataset (from their 2004 SIGKDD paper, right?) but the resulting files do not match what sentiments.py is looking for. Rather than two text files, named positive-words.txt and negative-words.txt, the data appear to consist of a separate file for each of the five products for which they annotated reviews. I'm guessing that y'all had some sort of processing script that produced a word list for use by sentiments.py, but I am not finding such a script in the SentiVAE repo. Did I download the wrong dataset, perhaps? Any pointers would be most appreciated.

stevenbedrick commented 3 years ago

Aha! I figured it out- I was looking in the "datasets" portion of Liu's website, but I should have been looking in the "Lexicon" section.