f4bD3v / humanitas

A price prediction toolset for developing countries
BSD 3-Clause "New" or "Revised" License
17 stars 7 forks source link

Tweet Processing: Cassandra, Spark, Shark - Storage issues #5

Closed mstefanro closed 10 years ago

mstefanro commented 10 years ago

NLP, Sentiment/emotion analysis, Clustering etc.

f4bD3v commented 10 years ago

Here is the methodological white paper of the UN global pulse research: http://www.slideshare.net/unglobalpulse/globalpulsecrimsonhexmethodspaper2011

A master thesis on a topic model approach to clustering tweets: "Clustering short status messages: A topic model based approach" http://ebiquity.umbc.edu/_file_directory_/papers/518.pdf

f4bD3v commented 10 years ago

Storing tweets with Cassandra:

Cassandra Query Language (CQL)

Cassandra Data Modelling Best Practices

https://github.com/apache/spark/blob/master/examples/src/main/scala/org/apache/spark/examples/CassandraCQLTest.scala

https://github.com/apache/spark/blob/master/examples/src/main/scala/org/apache/spark/examples/CassandraTest.scala

f4bD3v commented 10 years ago

Issue with Cassandra: too many tweets going through filter