Leibniz-HBI / smo-wiki

Generates a Github Page from the Social Media Observatory Wiki with Bash, Python, Regexes and Jekyll.
https://smo-wiki.leibniz-hbi.de
4 stars 1 forks source link

Explore German comment datasets and decide on curation in Wiki #36

Open FlxVctr opened 4 years ago

FlxVctr commented 4 years ago

frei zugänglicher Datensatz vom Standard u.a. mit Artikeln, Kommentaren und Moderationsentscheidung: https://ofai.github.io/million-post-corpus/ The Polly Corpus: online political debate in Germany: http://www.organisms.be/downloads/polly.pdf http://www.organisms.be/downloads/jaki2018.pdf https://docs.google.com/spreadsheets/d/1c5peNMjt24U0FcEMSj8gD_JjzumqXTWbPWa_yb2nNt0/edit#gid=1445690638 https://drive.google.com/drive/folders/1uhx_NotkG3KTc2yU3-FjnlhBj5e07rcs https://pub.uni-bielefeld.de/record/2909336 German corpus of tweets regarding refugees in Germany, annotated with hate speech ratings: https://github.com/UCSM-DUE/IWG_hatespeech_public Data For Everyone by the Figure Eight platform: https://www.figure-eight.com/data-for-everyone/ ggf.: https://www.researchgate.net/publication/340618080_Classifying_Constructive_Comments Außerdem ein paar Links zu deutschen Hatespeech-etc.-Diktionären:

deutschsprachige Begriffe auf hatebase.org (nur ca. 100): https://hatebase.org/search_results/language_id%3Ddeu deutschsprachige Datensets auf http://hatespeechdata.com: IWG_hatespeech_public: https://github.com/UCSM-DUE/IWG_hatespeech_public GermEval-2018-Data: https://github.com/uds-lsv/GermEval-2018-Data GermEval 2019: https://projects.fzai.h-da.de/iggsa/projekt/

Added to 'Twitter' Milestone, because some datasets contain tweets.

Later this issue should be split into several issues accordingly.

Khandoker09 commented 3 years ago

https://docs.google.com/document/d/1sxj3AWWVbQ4v_LcTHPc2s6k11w38gmrXNqpBvNTglOg/edit?usp=sharing

FlxVctr commented 3 years ago

I have added comments to the doc. Can you please additionally indicate which datasets you have decided to exclude and why? Thanks!