Derek-Jones / ESEUR-book

Issue handling for Evidence-based Software Engineering: based on the publicly available data
http://www.knosof.co.uk/ESEUR/
277 stars 18 forks source link

Sentiment analysis #13

Open aserebrenik opened 3 years ago

aserebrenik commented 3 years ago

researchers are starting to collate software engineering specific training data _1134_

Recently several articles and corresponding datasets have been released in this field:

Related work discusses confusion:

Derek-Jones commented 3 years ago

Thanks for posting details of these papers, and particular the datasets. I'm always keen to see new data.

My experience with analysis of natural language is that it is very hard to anything meaningful. It takes a lot of work to do even the simplest of tasks.

I was once excited by the possibility of sentiment analysis, then I tried to use the predictions (in a non-software engineering context). I relearned how important word order is to meaning, and that what people mean can be the opposite of the words they use. Building a software engineering sentiment analysis dataset is one thing, doing anything useful with it is another.

"Anger and its direction": I once did some work for a charity trying to extract information on torture. The main problem we had was trying to figure out whether somebody was talking about being tortured or talking about others doing torture (e.g., torture was a bad thing). Again the problem was context, and the fact that the tools we had were too primitive (and those of us involved not being experts in linguistic processing).

"Confusion detection": The more interesting dataset would be the initial confusion/nonconfucion classification made by each annotator. This might be used to get some idea of what level of confusion exists about software senetences. If this exercise was rerun, I would expect the gold set produced to be very different from the one produced by this work. People really are very different.