benchflow / data-transformers

Spark scripts utilised to transform data to the BenchFlow internal formats
Other
0 stars 0 forks source link

Improve data storing on Cassandra #18

Open VincenzoFerme opened 8 years ago

VincenzoFerme commented 8 years ago

We should check that we take care of different aspect of the Cassandra design.

As reported on http://www.ipponusa.com/blog/10-tips-and-tricks-for-cassandra/, we should:

Don’t use PreparedStatement if you insert empty columns If you have an empty column in your PreparedStatement, the CQL driver will in fact insert a null value in Cassandra, which will end up being a tombstone.

This is a very bad behavior, as:

The only solution is to have one PreparedStatement per type of insert query, which can be annoying if you have a lot of empty columns! But if you have multiple empty columns, shouldn’t you have used a Map to store that data in the first place?

Some other interesting references: