eventuate-foundation / eventuate-cdc

Other
71 stars 31 forks source link

CDC should purge Eventuate Tram message and received_messages tables #103

Open cer opened 3 years ago

cer commented 3 years ago

The Eventuate Tram message and received_messages tables accumulate messages that should be removed after a certain time period has elapsed.

The Eventuate Tram pipeline is perhaps a good place to implement this. It would have the following properties:

Perhaps the CDC could use its knowledge of the age of messages that it has processed to know whether its safe to delete old messages.

One issue: it assumes that every service has both tables. It would not work if a service only had a received_messages.

Thoughts @dartartem

dartartem commented 3 years ago

Chris, I have some questions

The Eventuate Tram pipeline is perhaps a good place to implement this.

what do you mean by "pipeline" here? Also, do you mean that not cdc but tram service (like customer-service, order-service) should be responsible for clean up?

it assumes that every service has both tables. It would not work if a service only had a received_messages.

Is it about view services for example? I mean tram services those only receive messages but not send, and use it's own schema with received_messages for duplicate publishing detector.

cer commented 3 years ago

Chris, I have some questions

The Eventuate Tram pipeline is perhaps a good place to implement this.

what do you mean by "pipeline" here? Also, do you mean that not cdc but tram service (like customer-service, order-service) should be responsible for clean up?

This issue exists in the CDC project because that's what should implement this mechanism.

The CDC has the concept of a pipeline:

https://github.com/microservices-patterns/ftgo-application/blob/a835e23bb0f3bc92dd712ff48a1510496ecb10fa/docker-compose.yml#L44-L46

A pipeline of type eventuate-tram knows that a schema has an Eventuate Tram message and received message table and so can periodically purge old messages.

it assumes that every service has both tables. It would not work if a service only had a received_messages.

Is it about view services for example? I mean tram services those only receive messages but not send, and use it's own schema with received_messages for duplicate publishing detector.

The CDC would not know about services that receive but don't send. That could be such a view service.

dartartem commented 3 years ago

Chris, thank you for clarification, I will read it more carefully on Monday

dartartem commented 3 years ago

One issue: it assumes that every service has both tables. It would not work if a service only had a received_messages.

@cer, could you please clarify this point? I just do not see problem here.

cer commented 3 years ago

@cer, could you please clarify this point? I just do not see problem here.

The CDC only knows about services that have the MESSAGE table. If a service only consumes messages, e.g. has a received_message table, the CDC cannot clean up that table.

dartartem commented 3 years ago

Chris, how about to create some fake pipeline that configured only for clean up?

cer commented 3 years ago

Chris, how about to create some fake pipeline that configured only for clean up?

Fake pipeline = Yuck. It's not a pipeline. It would have to be some other kind of concept, e.g. cleaner configured with JDBC URL.

Perhaps, this would be a better implementation:

dartartem commented 3 years ago

ok, will do

dartartem commented 3 years ago

Example configuration:

Default cleaner configuration with name "1" that uses datasource configuration from pipeline "1" (unified cdc) export EVENTUATE_CDC_CLEANER_1_PIPELINE=1

Default cleaner configuration with name "1" that uses datasource configuration from default pipeline (default configuration without unified cdc) export EVENTUATE_CDC_CLEANER_1_PIPELINE=default

Default cleaner configuration with name "1" that does not use datasource configuration from any pipeline

export EVENTUATE_CDC_CLEANER_1_DATASOURCEURL=jdbc:mysql://${DOCKER_HOST_IP:localhost}/eventuate
export EVENTUATE_CDC_CLEANER_1_DATASOURCEUSERNAME=mysqluser
export EVENTUATE_CDC_CLEANER_1_DATASOURCEPASSWORD=mysqlpw
export EVENTUATE_CDC_CLEANER_1_DATASOURCEDRIVERCLASSNAME=com.mysql.jdbc.Driver
export EVENTUATE_CDC_CLEANER_1_EVENTUATESCHEMA=someSchema

Custom cleaner properties:

export EVENTUATE_CDC_CLEANER_1_PURGE_PURGEMESSAGESENABLED=true #purge message table
export EVENTUATE_CDC_CLEANER_1_PURGE_PURGEMESSAGESMAXAGEINSECONDS=60 #max age 60 seconds
export EVENTUATE_CDC_CLEANER_1_PURGE_PURGERECEIVEDMESSAGESENABLED=true #purge received message table
export EVENTUATE_CDC_CLEANER_1_PURGE_PURGERECEIVEDMESSAGESMAXAGEINSECONDS=60 #max age 60 seconds
export EVENTUATE_CDC_CLEANER_1_PURGE_PURGEINTERVALINSECONDS=60 #make clean up every 60 seconds

Note: purge is disabled by default

cer commented 3 years ago

Thanks. However, ...PURGE_PURGE... doesn't seem sensible. Also, what are the default values of the properties, such as EVENTUATE_CDC_CLEANER_1_PURGE_PURGEMESSAGESMAXAGEINSECONDS

BTW 60 seconds is too short for max age. I'd pick a value like 24 hours (whatever that is in seconds).

dartartem commented 3 years ago

Chris,

However, ...PURGE_PURGE... doesn't seem sensible.

yes, I wanted to optimize to better name, however delayed it, because I cannot run tests.

EVENTUATE_CDC_CLEANER_1_PURGE_PURGEMESSAGESMAXAGEINSECONDS

BTW 60 seconds is too short for max age. I'd pick a value like 24 hours (whatever that is in seconds).

it is 48h, should I change to 24?

cer commented 3 years ago

it is 48h, should I change to 24?

No. 48h is fine.

cer commented 3 years ago

Regarding

Probably it would be useful to mark processed messages as published by PostgresWal reader and MysqlBinlog readers,so, message cleaner would purge only processed (by cdc) messages.

It's unclear what strategy to use to ensure that the CDC has processed messages before they are deleted:

isfong commented 2 years ago

hello! @cer Is this functionality implemented now? My message now has millions of backlogs. How do I configure the cleanup task?