cfwelch / targeted_sentiment

Code for targeted sentiment publications which uses a pipeline to first extract class and instructor entities from text and then classify the sentiment expressed toward these entities as positive, negative, or neutral.
MIT License
1 stars 0 forks source link

**with open('EECS_annotated_samples_anonymized') as handle: IOError: [Errno 2] No such file or directory: 'EECS_annotated_samples_anonymized'** #1

Closed monajalal closed 6 years ago

monajalal commented 6 years ago

Hi Charlie,

Thanks a lot for uploading your code to github. So I followed all commands in the README and got the following error. Additionally, I was not sure what exactly should be renamed to ned_words and pos_words. Can you please have a look at the below structure and let me know if it is correct?

[jalal@goku targeted_sentiment]$ python make_crfbioyz.py
Traceback (most recent call last):
  File "make_crfbioyz.py", line 6, in <module>
    **with open('EECS_annotated_samples_anonymized') as handle:
IOError: [Errno 2] No such file or directory: 'EECS_annotated_samples_anonymized'**
[jalal@goku targeted_sentiment]$ ls
IQR.py       analysis.py              classifiers       dicts.py             liwc_analysis.py       mitchell_crf.py      senti_set.py            splits.py                   unique_entities.py
LICENSE      annotateSentiment.py     crf_example.py    distances.py         liwc_probabilities.py  parseMitchell.py     sentimentAgreement.py   spwrap.py
NLU.py       baseline.py              crfnerbioyz.prop  entity_distance.py   majority_baseline.py   saveUtterancesFA.py  sentiment_class.py      stanford_corenlp_pywrapper
README.md    checkLabels.py           data              entity_extractor.py  make_crfbioyz.py       semeval15.py         sentiment_treemaker.py  string_match_baseline.py
acronyms.py  check_sentence_level.py  dependencies      leastSquares.py      map_names.py           senti_lexis.py       split_analysis.py       taggers
[jalal@goku targeted_sentiment]$ ls data/
bing_liu_lexicon  mpqa  negate  opinion-lexicon-English  opinion-lexicon-English.rar  subjectivity_clues_hltemnlp05  subjectivity_clues_hltemnlp05.zip
[jalal@goku targeted_sentiment]$ tree data
data
├── bing_liu_lexicon
│   ├── neg_words
│   └── pos_words
├── mpqa
│   └── subjclueslen1-HLTEMNLP05.tff
├── negate
├── opinion-lexicon-English
│   ├── negative-words.txt
│   └── positive-words.txt
├── opinion-lexicon-English.rar
├── subjectivity_clues_hltemnlp05
│   ├── __MACOSX
│   │   └── subjectivity_clues_hltemnlp05
│   └── subjectivity_clues_hltemnlp05
│       ├── subjclueslen1-HLTEMNLP05.README
│       └── subjclueslen1-HLTEMNLP05.tff
└── subjectivity_clues_hltemnlp05.zip

7 directories, 10 files
[jalal@goku targeted_sentiment]$ tree dependencies/
dependencies/
├── README.md
├── build.sh
├── examples
│   ├── csamp.txt
│   └── lee_example.txt
├── proc_text_files.py
├── proc_text_files_to_stdout.py
├── run_examples.sh
├── sample.ini
├── setup.py
├── stanford-corenlp-full-2015-04-20
│   ├── CoreNLP-to-HTML.xsl
│   ├── LIBRARY-LICENSES
│   ├── LICENSE.txt
│   ├── Makefile
│   ├── README.txt
│   ├── SemgrexDemo.java
│   ├── ShiftReduceDemo.java
│   ├── StanfordCoreNlpDemo.java
│   ├── StanfordDependenciesManual.pdf
│   ├── build.xml
│   ├── corenlp.sh
│   ├── ejml-0.23-src.zip
│   ├── ejml-0.23.jar
│   ├── input.txt
│   ├── input.txt.xml
│   ├── javax.json-api-1.0-sources.jar
│   ├── javax.json.jar
│   ├── joda-time-2.1-sources.jar
│   ├── joda-time.jar
│   ├── jollyday-0.4.7-sources.jar
│   ├── jollyday.jar
│   ├── patterns
│   │   ├── example.properties
│   │   ├── goldnames.txt
│   │   ├── goldplaces.txt
│   │   ├── names.txt
│   │   ├── otherpeople.txt
│   │   ├── places.txt
│   │   ├── presidents.txt
│   │   └── stopwords.txt
│   ├── pom.xml
│   ├── protobuf.jar
│   ├── stanford-corenlp-3.5.2-javadoc.jar
│   ├── stanford-corenlp-3.5.2-models.jar
│   ├── stanford-corenlp-3.5.2-sources.jar
│   ├── stanford-corenlp-3.5.2.jar
│   ├── sutime
│   │   ├── defs.sutime.txt
│   │   ├── english.holidays.sutime.txt
│   │   └── english.sutime.txt
│   ├── tokensregex
│   │   ├── color.input.txt
│   │   ├── color.properties
│   │   ├── color.rules.txt
│   │   └── retokenize.txt
│   ├── xom-1.2.10-src.jar
│   └── xom.jar
├── stanford-corenlp-full-2015-04-20.zip
├── stanford_corenlp_pywrapper
│   ├── __init__.py
│   ├── __init__.pyc
│   ├── javasrc
│   │   ├── corenlp
│   │   │   ├── JsonPipeline.java
│   │   │   ├── PipeRunner.java
│   │   │   └── SocketServer.java
│   │   └── util
│   │       ├── Arr.java
│   │       ├── BasicFileIO.java
│   │       ├── JsonUtil.java
│   │       ├── U.java
│   │       └── misc
│   │           ├── Pair.java
│   │           └── Triple.java
│   ├── lib
│   │   ├── corenlpwrapper.jar
│   │   ├── guava-13.0.1.jar
│   │   └── jackson-all-1.9.11.jar
│   ├── rcorenlp.r
│   ├── sockwrap.py
│   └── sockwrap.pyc
├── targetedSentiment.2017
│   ├── EECS_annotated_samples_anonymized
│   ├── README_v1.0.txt
│   └── tsent_dicts.py
└── targetedSentiment.2017.tar.gz

12 directories, 75 files
[jalal@goku targeted_sentiment]$ 
cfwelch commented 6 years ago

Hello!

The dataset can be downloaded from http://web.eecs.umich.edu/~mihalcea/downloads/targetedSentiment.2017.tar.gz and should be placed in this folder. This will give you the EECS_annotated_samples_anonymized file.

The pos_words and neg_words are from the lexicon you downloaded from Bing Liu. If you look at the files in bing_liu_lexicon they just contain the instructions for downloading the files, so its the files in opinion-lexicon-English that are supposed to be in bing_liu_lexicon and renamed.