clanghout / twitch-classification

mmsr twitch genre classification based on chat
0 stars 0 forks source link

Features #3

Closed Keraito closed 7 years ago

Keraito commented 7 years ago

This is a master issue to define what kind of features we want to extract from our chat logs. Currently, the concrete ideas are:

Some features for in the future:

More features can be mentioned and discussed here.

Keraito commented 7 years ago

Maybe also something like average caps characters per message or %caps characters per message?

mpasterkamp commented 7 years ago

I would also like to look into representing the stream as text documents consisting of its chat messages. This allows to look into existing document similarity methods

Keraito commented 7 years ago

As discussed in the meeting today and building on top of https://github.com/clanghout/twitch-classification/issues/3#issuecomment-309755535, it might be interesting to try to represent the documents with Twitch Emojis. @mpasterkamp

clanghout commented 7 years ago

Features are used in the clustering approaches. We took the four mentioned features as base for the clustering algorithms.