Fine-Grained Contextual Classification Evaluation Dataset

Evaluation Dataset for the following manuscript:

Yiping Jin, Vishakha Kadam and Dittaya Wanvarie, Bootstrapping Large-Scale Fine-Grained Contextual Advertising Classifier from Wikipedia. (TextGraphs-15 Workshop@NAACL 2021)

The files are organized in the folder [coarse category ID]/[fine-grained category name]. Mapping of the coarse-grained category IDs:

IAB1     arts-entertainment
IAB2     automotive
IAB3     business
IAB4     careers
IAB5     education
IAB6     family-parenting
IAB7     health-fitness
IAB8     food-drink
IAB9     hobbies-interests
IAB10    home-garden
IAB11    law-gov-t-politics
IAB12    news
IAB13    personal-finance
IAB14    society
IAB15    science
IAB16    pets
IAB17    sports
IAB18    style-fashion
IAB19    technology-computing
IAB20    travel
IAB21    real-estate
IAB22    shopping
IAB23    religion-spirituality

You can download the full training dataset here (2.2GB).

If you make use of this dataset for your research, please cite the following paper:

@inproceedings{jin-2021-bootstrapping,
    title = "Bootstrapping Large-Scale Fine-Grained Contextual Advertising Classifier from Wikipedia",
    author = "Jin, Yiping and Kadam, Vishakha and Wanvarie, Dittaya",
    booktitle = "Proceedings of the Graph-based Methods for Natural Language Processing (TextGraphs)",
    year = "2021",
    publisher = "Association for Computational Linguistics",
}

YipingNUS / contextual-eval-dataset

readme

Fine-Grained Contextual Classification Evaluation Dataset