CogComp / cogcomp-nlp

CogComp's Natural Language Processing Libraries and Demos: Modules include lemmatizer, ner, pos, prep-srl, quantifier, question type, relation-extraction, similarity, temporal normalizer, tokenizer, transliteration, verb-sense, and more.
http://nlp.cogcomp.org/
Other
471 stars 144 forks source link

Replicate the experiments in "Importance of Semantic Representation: Dataless Classification" #622

Open ZeweiChu opened 6 years ago

ZeweiChu commented 6 years ago

I am trying to replicate the experiments in the paper. Importance of Semantic Representation: Dataless Classification https://cogcomp.org/page/publication_view/178 However, I cannot find the exact definition of the experiment "binary classification with Yahoo Answers dataset". I wonder if the author of this github repository could help to clarify this.

shatu commented 6 years ago

On the 2nd page of the paper, there's a description of the Yahoo Answers Dataset. The corresponding experimental setup is outlined on the 3rd page.

The dataset is available here: https://cogcomp.org/page/resource_view/89

From the paper itself:

For the Yahoo! Answers dataset, we generated 20 random binary classification problems at the subcategory level. Some of these problems are shown in Table 4.

From table 4, such binary classification problems will look like:

If you can be a bit more specific about your confusion/question, I can try to address it.