Readme: Link to advice on how to protect against/recover from WAL slot deletion failure

mk-pmb commented 6 years ago

This project sounds interesting. However, the current readme's warning:

WARNING: PostgreSQL will hold onto the WAL logs until all logical replication slots have consumed their data. This means that if you try out Lapidus and fail to delete your slot that you'll likely run out of disk space on your system.

sounds like my server could easily go on a rampage if the process using lapidus crashes or is killed for whatever reason. The readme should either explain how that's a wrong interpretation, or give a link to advice on how to prevent this scenario (e.g. monitor and auto-fix). The link to the manual for general WAL slot management doesn't currently look like it answers that particular concern.

jmealo commented 6 years ago

Sorry for the delay. Lapidus uses slots that are intended for database replication so this behavior is desireable. If you're using PostgreSQL 10 you can use a temporary slot which will be deleted when/if Lapidus disconnects (which will not hold onto WAL). Lapidus is a wrapper around pg_recvlogical which has proper reconnection behavior (you can review the code here: https://github.com/postgres/postgres/blob/master/src/bin/pg_basebackup/pg_recvlogical.c). I think I'd need to add support for this as the next time the slot was auto-recreated it would not be a temporary slot.

Expected behavior: Lapidus properly respawns pg_recvlogical if it crashes and pg_recvlogical will be killed if Lapidus dies. pg_recvlogical handles reconnection properly.

Unfortunately, Lapidus can't do much to help you out in a case when Lapidus crashed or otherwise stopped working, so handling "out of space" situations is left to the user*.

*This is a very important condition to monitor for anyway, as monitoring for a replication slot causing an out of space scenario leaves you vulnerable to many other causes of running out of disk space.

I'd be happy to provide some examples of how to do this if you let me know a little more about your use case.

mk-pmb commented 6 years ago

Thanks for the reply! Using a temporary slot with PostgreSQL sounds like a good idea. My use case is just theory atm. I considered to try and use lapidus to upgrade my long polling AJAX chat to a websocket chat. It's currently backed by MySQL anyway so the scary warning doesn't apply yet, I just hope I can migrate the site to PG some day maybe next year.

JarvusInnovations / lapidus

Readme: Link to advice on how to protect against/recover from WAL slot deletion failure #12