The migration is executed by three steps, collecting data from existing CFs, projection, loading to new CF. In the result of projection, some duplicated results could be produced. Since this duplication is solved when we write the record on Cassandra, there is no operation problem. However, this is better to be considered in the cost estimation.
Solution
The CF creation cost in the migration process would be more accurate by using the size size = (existing CF entries) * (new CF record width). Once the records are written on Cassandra, since records duplication is removed, we should use the CF size which is calculated by size = (CF entries) * (CF record width) at another process.
Problem
The migration is executed by three steps, collecting data from existing CFs, projection, loading to new CF. In the result of projection, some duplicated results could be produced. Since this duplication is solved when we write the record on Cassandra, there is no operation problem. However, this is better to be considered in the cost estimation.
Solution
The CF creation cost in the migration process would be more accurate by using the size
size = (existing CF entries) * (new CF record width)
. Once the records are written on Cassandra, since records duplication is removed, we should use the CF size which is calculated bysize = (CF entries) * (CF record width)
at another process.