Hello! I am attempting to run vae.py, and have been working on downloading all of the requisite datasets. I am stuck on the Hu & Liu 2004 dataset- from the linked website, I have obtained what I believe to be the right dataset (from their 2004 SIGKDD paper, right?) but the resulting files do not match what sentiments.py is looking for. Rather than two text files, named positive-words.txt and negative-words.txt, the data appear to consist of a separate file for each of the five products for which they annotated reviews. I'm guessing that y'all had some sort of processing script that produced a word list for use by sentiments.py, but I am not finding such a script in the SentiVAE repo. Did I download the wrong dataset, perhaps? Any pointers would be most appreciated.
Hello! I am attempting to run
vae.py
, and have been working on downloading all of the requisite datasets. I am stuck on the Hu & Liu 2004 dataset- from the linked website, I have obtained what I believe to be the right dataset (from their 2004 SIGKDD paper, right?) but the resulting files do not match whatsentiments.py
is looking for. Rather than two text files, namedpositive-words.txt
andnegative-words.txt
, the data appear to consist of a separate file for each of the five products for which they annotated reviews. I'm guessing that y'all had some sort of processing script that produced a word list for use bysentiments.py
, but I am not finding such a script in the SentiVAE repo. Did I download the wrong dataset, perhaps? Any pointers would be most appreciated.