Closed Keraito closed 7 years ago
Maybe also something like average caps characters per message
or %caps characters per message
?
I would also like to look into representing the stream as text documents consisting of its chat messages. This allows to look into existing document similarity methods
As discussed in the meeting today and building on top of https://github.com/clanghout/twitch-classification/issues/3#issuecomment-309755535, it might be interesting to try to represent the documents with Twitch Emojis. @mpasterkamp
Features are used in the clustering approaches. We took the four mentioned features as base for the clustering algorithms.
This is a master issue to define what kind of features we want to extract from our chat logs. Currently, the concrete ideas are:
Some features for in the future:
More features can be mentioned and discussed here.