Closed nasifimtiazohi closed 6 years ago
3- I remember having a confusion in this regard. However, the tool definitely returns only score (not any label) between 0 to 1 for which the variable is named prob[polite]
. There is also prob[impolite]
which is just 1-prob[polite]
. The original paper also states,
For new requests, we use class probability estimates obtained by fitting a logistic regression model to the output of the SVM (Witten and Frank, 2005) as predicted politeness scores (with values between 0 and 1; henceforth politeness, by abuse of language).
But they don't mention a threshold, that's why we had to figure out the threshold.
The confusion arises because on their website they give a label for each text with a confidence rating. Probably, some other paper might also say this, I don't clearly remember.
However, Jongeling's paper who also uses this politeness tool, states,
Given a textual fragment the Stanford politeness API returns a politeness score ranging between 0 (impolite) and 1 (polite) with 0.5 representing the “ideal neutrality”. To discretize the score into polite, neutral and impolite we apply the Stanford politeness API to the seven datasets above. It turns out that the politeness scores of the majority of comments are low: the median score is 0.314, the mean score is 0.361 and the third quartile (Q3) is 0.389. We use the latter value to determine the neutrality range. We say therefore that the comments scoring between 0.389 and 0.611 = 1 − 0.389 are neutral; comments scoring lower than 0.389 are impolite and comments scoring higher than 0.611 are polite.
note: Jongeling's paper does not manually rate politeness.
So, I'm pretty sure about our tool usage in the paper. The original paper is actually a bit confusing in this regard I would say.
5- This is an obvious threat. However, if you go through the original paper, the tool recognizes general patterns of politeness in written texts. Also, this is already in some use in SE research. So, the tool is not completely irrelevant.
6- I don't know why training and testing need to be aligned with. Having only two coders is definitely a threat though.
R1