akka / akka-persistence-cassandra

A replicated Akka Persistence journal backed by Apache Cassandra
https://doc.akka.io/docs/akka-persistence-cassandra/
Other
329 stars 135 forks source link

Slow clean tagged events and overload Cassandra by overfetch insufficient data for deleteFromTagView #1103

Open Valocop opened 1 month ago

Valocop commented 1 month ago

https://github.com/akka/akka-persistence-cassandra/blob/8006539934ff77bf72b1a8062a44478686112945/core/src/main/scala/akka/persistence/cassandra/reconciler/DeleteTagViewForPersistenceId.scala#L40

The problem is that when we try to clean events by tags, Cassandra Journal runs a stream with current events by tag and over fetch data by fetching event payload for every event. It takes a lot of time to clean tag_views. Better make stream that will be fetch events without payload, these data will be enough for deleteFromTagView.

I faced this problem when cleaning events in tag with a lot of events (millions) and the payload is a big.

Can we fix it, please? Thanks

patriknw commented 4 weeks ago

Good point, I'll try to adjust that later this week, unless you want to fix it?

Valocop commented 3 weeks ago

I would be very grateful for your help! I can try, but I need to discuss how to implement it. Better add a new method or change current currentEventsByTagInternal(...) and use flag for fetching all data or without event payload? @patriknw

Thanks

patriknw commented 3 weeks ago

I guess, since most things are the same it would be easiest with a flag. In the end it's a different cql (prepared statement) and change in deserializeEventsByTagRow. The payload in the PersistentRepr could be set to NotUsed for this case.

patriknw commented 2 weeks ago

@Valocop Are you working on this, or shall I give it a try?