Open mictebbe opened 3 years ago
The paper mentioned the problem of the conflation between hate and offensive speech, whereas the dataset T1 has three labels (hate speech, offensive, and ordinary). I think that the rules of distinguishing these classes are subjective and sensitive to many factors that could also differ over time. what are the criteria that differ between hate and offensive speech? are the cross_cultural differences taken into consideration?
The paper mentioned also the influence of the imbalanced classes on the performance of the classifiers, could expansion of the training data solve this problem in practice (in our case, by adding more hate speech comments)?
A very important point that has been broached is the necessity of more focus on the datasets and qualitative analysis rather than models, when we talk about YouTube, Facebook, or Twitter, how efficient is the role of the content moderators from this perspective?
question 1: In section 2.1 the referenced paper mentioned that by using LSTM+GBDT they get a better result than which only using LSTM. But the author got the opposite result i.e. LSTM better than LSTM+GBDT. Why the more complex model get a bad result? Is that the problem of the model itself or it depends on the datasets they use? A possible reason that I think is, maybe the model is overfitting?
question 2: In section 3.3 the author used the word appending method in the Adversarial training. It is certainly useful to add common words to the hate speech and make the dataset more general. But on the other side, If you add hate words to the common sentences, actually the whole speech turns into the "hate speech" class. So does it make sense? what's the meaning of adding hate words to normal speech? I personally don't think it could be useful for getting a better result.
question 3: In section 8 the author mentioned that we should focus more on the dataset instead of the models. What should we do exactly about it? Are there any ideas?
Seeing as simple white space removeal effectively kills word-based detection, could one say that word-based hate-speech detection is not really feasible because it is easily tricked by just every average user?
In the case of white space removal...i don't think it is always that easy to reproduce the original content from this for the reader, as removing all whitespaces can make the text very hard to read, so i dont think it is such an easy to use technique for an adversary as the authors want us to believe
Adding love to the sentence makes it indeed less toxic/hateful for a ML algorithm because from a pure numerical/statistical standpoint, love is an anti-hate/toxic word, so it seems logical to reduce the toxicity score for that sentence then. Are there any ideas currently how to resolve such cases, as context gets very important here, can ML work with this?
The threat of false positive results is very high considering that these algorithms will sooner or later be used unsupervised in social networks (such as Facebook). The German NetzDG (Network enforcement act) forces these social networks to delete fake news and hate speech. If these algorithms are quite easy to attack and the deletion of false positives could lead to violation of the freedom of speech is it really desirable to use algorithms for that?
Since char-based algorithms are much more resistible against attacks why are word based even used? Is the performance so much better?
A lot in this paper surrounds the topic of the datasets. It was mentioned that the used data is labelled manually by different people. The authors of the paper believed that there are great differences in the dataset based on whether the data was labelled in 2 groups or in 3. Wouldn't this assumption imply that all researchers would label all data the same based on whether it is divided in 2 or 3 groups? From my perspective the impact of the decisions that the "labellers" make (due to not having a strong definition of hate speech) should have a much higher impact on the datasets than the fact that some of the data ist divided the non-state speech further.
Since words like "love" or the "F" word can strongly affect the prediction of the text, how can one train the classifier to differentiate between positively and negatively annotated swear words for example? Or rather how can a model be taught to understand the connections between different words?
Training and testing the classifiers with the original datasets resulted in good predictions, but using different test sets the F1 scores drastically deteriorated. Language models usually have to be fine-tuned for special data sets or maybe even platform-specific. Would it make sense to train on positive or negative words first and then to adjust the model later?
Adversarial attacks can have a huge impact on the models' performance. I once did that with an image classifier by adding some noise to the pictures. With the human eye, it could be clearly recognized but the machine totally failed. Regarding a language model should errors such as spelling mistakes be included in the model planning phase? How can one prevent such attacks?
Question 1: What characterises offensive speech? What are the boundaries between offensive and hate speech?
Question 2: How we can construct datasets comprising large variety of hate speech variants from diverse sources? Also, how do we label the data since offensive/hate speech is a subjective matter.
Question 3: In the introduction the authors claim that "hate speech detection is largely independent of model architecture." However, in section 4 they say that model selection has influence on model performance in terms of attack resilience. How does this two statements might be true?
1.How to set clear boundaries between false positive and false negative? 2.They mentioned that based on the asymmetry of hate speech classification, the solution might be to reintroduce more traditional keyword-based approaches. Will the keyword-based approach not be too simple to detect hate speech efficiently? If not, how will it works? Should the keyword-based approach be combined with other techniques?
the authors note that the models classifies on prevalence and therefore can make false classification if sufficient test from the opposing class is added. are there any current approaches which analyze the text in parts (maybe in sentences) and classify the text as hate speech if one part is hate speech?
False positives are quite common with these methods in either direction. Would it be more desirable to have the majority of false positives be classified as hate speech and thus removing them or is free speech more important even if it means more hate speech is unrecognized?
In the related work section the authors note that the appended "good words" problem is known from spam filters. How would adding such a spam filter approach as a pre-processing step affect the quality of the classification? And should detected text with “good words” appended automatically be classified as hate speech?
1. How were the performances of the models finally evaluated? By human beings cross-checking results?
2. A result of this study is, that all models somehow are equally "good" in classifying hate-speech. Therefore, according to the authors, the focus of future research should be on the datasets instead of the models. How would that work? If all models have difficulties classifying new content, how can you improve the classification of hate speech by improving the data set?
3. To what extent could transfer learning help to make the models more efficient?
1. The paper raises the problem of properly distinguishing between the concepts of hate speech and offensive speech. Although a definition of the former was offered, a clear delimitation between these two concepts was not presented in the paper. It was also mentioned that upon testing each two-class model on offensive ordinary speech, they proved to be susceptible to false positives. What are the criteria applied to distinguish between these two concepts? Can these criteria be considered to be cross-cultural? More examples of words which can be classified in these two groups would be helpful.
2. Regarding the classification of data in hate- and offensive-speech, it seemed to me that this is still rather a subjective matter, basing my assumption on the models of different researchers used in the paper. How can data pertaining to either of these two groups be labeled more efficiently?
3. The paper showed that appending the text with words like "love" or "F" directly affect the toxicity score of a sentence, leading to false positives or false prediction for sentences. How can a classifier be trained to correctly differentiate between the contexts in which such words ("love", "F", etc.) are used so that for example non-hateful sentences which contain an "F" word for example, are not predicted to be hateful?
1. The authors showed that adversarial attack strategies are very efficient against hate speech classification models. They further discussed the positive impact of adversarial training to prevent misclassification of altered samples. Since one can only add adversarial training samples for known adversarial attack strategies, could you imagine other adversarial attacks than those described in the paper?
2. It was said that all models performed equally well, when they were tested on similar data they were previously trained on. However, when the models were trained on one data set (e.g T1) and then tested on another (e.g T2) the performance was massively reduced. Imagine one would combine the models (all trained on different data) by applying them separately on an arbitrary data set and implement a majority vote for the final classification. Do you think it would result in better classification accuracies?
3. Word-based and character-based approaches differ completely their building structure. Which of the two do you think is the more promising strategy for the future and why?
1) Could the performance drop on the different data types be the consequence of overfitting?
2) Are there any ways to deal with the described adversaries?
3) What's the main difference between the researchers’ solution and Google’s one?
1. Taking into account the design of a model, what could be the reason for the character-level features to overperform the world-level ones? 2. How these seven models were chosen? 3. What contributes more to the results - the data type or the labeling criteria?
1.1 Discussion questions Write down 3 questions that came up while reading the paper that would be interesting to discuss in the next session. Post your Questions on GitHub as comments under the assignment.
Time slots project presentation: Find a slot to present you project in session 13 (11.02.) or 14 (18.02.): https://docs.google.com/spreadsheets/d/1DdkST3KZV4x9D5nGsHgevIASmu_rFkK0Bx2r4AeBGPE/edit#gid=1895482106