Open FlxVctr opened 4 years ago
frei zugänglicher Datensatz vom Standard u.a. mit Artikeln, Kommentaren und Moderationsentscheidung: https://ofai.github.io/million-post-corpus/ The Polly Corpus: online political debate in Germany: http://www.organisms.be/downloads/polly.pdf http://www.organisms.be/downloads/jaki2018.pdf https://docs.google.com/spreadsheets/d/1c5peNMjt24U0FcEMSj8gD_JjzumqXTWbPWa_yb2nNt0/edit#gid=1445690638 https://drive.google.com/drive/folders/1uhx_NotkG3KTc2yU3-FjnlhBj5e07rcs https://pub.uni-bielefeld.de/record/2909336 German corpus of tweets regarding refugees in Germany, annotated with hate speech ratings: https://github.com/UCSM-DUE/IWG_hatespeech_public Data For Everyone by the Figure Eight platform: https://www.figure-eight.com/data-for-everyone/ ggf.: https://www.researchgate.net/publication/340618080_Classifying_Constructive_Comments Außerdem ein paar Links zu deutschen Hatespeech-etc.-Diktionären:
deutschsprachige Begriffe auf hatebase.org (nur ca. 100): https://hatebase.org/search_results/language_id%3Ddeu deutschsprachige Datensets auf http://hatespeechdata.com: IWG_hatespeech_public: https://github.com/UCSM-DUE/IWG_hatespeech_public GermEval-2018-Data: https://github.com/uds-lsv/GermEval-2018-Data GermEval 2019: https://projects.fzai.h-da.de/iggsa/projekt/
Added to 'Twitter' Milestone, because some datasets contain tweets.
Later this issue should be split into several issues accordingly.
https://docs.google.com/document/d/1sxj3AWWVbQ4v_LcTHPc2s6k11w38gmrXNqpBvNTglOg/edit?usp=sharing
I have added comments to the doc. Can you please additionally indicate which datasets you have decided to exclude and why? Thanks!
frei zugänglicher Datensatz vom Standard u.a. mit Artikeln, Kommentaren und Moderationsentscheidung: https://ofai.github.io/million-post-corpus/ The Polly Corpus: online political debate in Germany: http://www.organisms.be/downloads/polly.pdf http://www.organisms.be/downloads/jaki2018.pdf https://docs.google.com/spreadsheets/d/1c5peNMjt24U0FcEMSj8gD_JjzumqXTWbPWa_yb2nNt0/edit#gid=1445690638 https://drive.google.com/drive/folders/1uhx_NotkG3KTc2yU3-FjnlhBj5e07rcs https://pub.uni-bielefeld.de/record/2909336 German corpus of tweets regarding refugees in Germany, annotated with hate speech ratings: https://github.com/UCSM-DUE/IWG_hatespeech_public Data For Everyone by the Figure Eight platform: https://www.figure-eight.com/data-for-everyone/ ggf.: https://www.researchgate.net/publication/340618080_Classifying_Constructive_Comments Außerdem ein paar Links zu deutschen Hatespeech-etc.-Diktionären:
deutschsprachige Begriffe auf hatebase.org (nur ca. 100): https://hatebase.org/search_results/language_id%3Ddeu deutschsprachige Datensets auf http://hatespeechdata.com: IWG_hatespeech_public: https://github.com/UCSM-DUE/IWG_hatespeech_public GermEval-2018-Data: https://github.com/uds-lsv/GermEval-2018-Data GermEval 2019: https://projects.fzai.h-da.de/iggsa/projekt/
Added to 'Twitter' Milestone, because some datasets contain tweets.
Later this issue should be split into several issues accordingly.