artmoskvin / real-time-recommender

Real-time collaborative filtering recommender system
48 stars 28 forks source link

Cassandra things #6

Open wahyudierwin opened 4 years ago

wahyudierwin commented 4 years ago

Hi @moscowart ,

Thanks for the great repository of real-time recommender system! I've been reading the repository for weeks, and it is very amazing.

There is a thing bothering me about Cassandra. In these lines,

case Success(Some(currentSimilarity)) =>
        storage.similarities.deleteRow(Similarity(firstItem, secondItem, currentSimilarity.similarity))
        storage.similarities.deleteRow(Similarity(secondItem, firstItem, currentSimilarity.similarity))
        storage.similarities.store(Similarity(firstItem, secondItem, similarity))
        storage.similarities.store(Similarity(secondItem, firstItem, similarity))
        storage.similaritiesIndex.store(SimilarityIndex(pairId, similarity))

so we will do DELETE the old similarity score and then INSERT the new similarity. These DELETE will be executed many times and create many tombstones in Cassandra. Doesn't that pose a problem, specially in production environment?

Thanks in advance.

artmoskvin commented 4 years ago

Hi @wahyudierwin. Thanks for pointing this out. Yes, this might be a problem. I'd suggest to closely monitor the number of tombstones and configure compaction appropriately.