ExchangeUnion / xud

Exchange Union Daemon 🔁 ⚡️
https://exchangeunion.com
GNU Affero General Public License v3.0
115 stars 49 forks source link

feat: lock file #2011

Closed sangaman closed 2 years ago

sangaman commented 3 years ago

This introduces a lock file for xud that is created in the xud data directory any time xud starts up. Anytime a lock file already exists, an error message is logged and xud exits. This prevents multiple xud processes from trying to use the same node key or directory at the same time.

The lock files are unique to each network, so running multiple processes with different networks (e.g. simnet and mainnet) is allowed. They are deleted when xud shuts down.

Closes #1989.

sangaman commented 3 years ago
* [ ]  does not work for xud-docker env

Steps:

1. `bash xud.sh -b lock`

2. initialize the wallet and unlock it

3. docker rm -f xud_simnet_1

4. `bash xud.sh -b lock`

Actual result: .lock file still exists Screenshot from 2020-12-02 16-13-44 Screenshot from 2020-12-02 16-13-49 Screenshot from 2020-12-02 16-13-46

Hm this is likely due to the sudden/ungraceful way of stopping the container, not giving it a chance to delete the lock file. Does "downing" the container give the same behavior? I can try testing that myself. I wonder if there's anything else to do from the xud perspective to delete the lock file with ungraceful shutdowns, but if the process is just killed abruptly then I'm pretty sure nothing can be done. I could maybe put in a prompt on the docker side to delete the lock file if one is encountered.

michael1011 commented 3 years ago

I wonder if there's anything else to do from the xud perspective to delete the lock file with ungraceful shutdowns, but if the process is just killed abruptly then I'm pretty sure nothing can be done.

How does Bitcoin Core handle this? The also have lockfiles for the database files and somehow managed to deal with them gracefully even when you force kill the Bitcoin Core process. But researching that is low prio.

I could maybe put in a prompt on the docker side to delete the lock file if one is encountered.

In xud-docker we could simply check whether an xud container is running and if not, automatically remove the lockfile.

kilrau commented 3 years ago

This could also happen on other ungraceful shutdowns and I believe we should handle it, otherwise we'll have to explain to our users how to delete lockfiles :/ I never had lockfile issues with any other software, even though I killed processes/containers, so there must be a trick to it.

Probably a stupid question, but can a lockfile depend on a process? https://stackoverflow.com/questions/1599459/optimal-lock-file-method/ and https://perl.plover.com/yak/flock/ recommend using flock() and the process actually tries to obtain the lock on the file, not judging by pure existence of the lock file as far as I understand it.

sangaman commented 3 years ago

I'm used to deleting lock files with other applications that crash suddenly (due to power loss, etc). I believe I've had it happen with bitcoind too, and there are plenty of results for people getting cannot obtain a lock on data directory with bitcoin core. Conceptually I can't think of any you could delete a file inside a process that is suddenly killed.

Making it depend on a process could work potentially, but I don't know how we would play nice with running multiple processes on one machine which is a valid use case if they're using different networks or perhaps even different data directories.

sangaman commented 3 years ago

I can confirm that down or stop xud in xud-docker handle the lock file gracefully. So I think from docker's perspective it makes sense to prompt the user if it encounters a lock file. I'll look into that separately.

I'll fix this PR to update the xud custom patch.

kilrau commented 3 years ago

Thanks again for throwing this up @sangaman , we definitely want and need an advisory lock like this. I still have the following concerns though:

I'm used to deleting lock files with other applications that crash suddenly (due to power loss, etc).

I am not tbh, and I killed my bitcoind/lnd multiple times already including sudden power loss. I am just not convinced we are handling things right if a simple container kill, leaves the .lock file on disk and then xud as well-behaved process refuses to start on this data dir because the .lock file exists. I am not challenging the .lock file approach in general, just how it's currently handled in this PR by deleting the file on graceful shutdown and judging .lock file exists = data dir locked.

I definitely would want to avoid explaining to users how to delete a .lock file vs. protection in a case where multiple xud's try to run on the same data dir.

So I did a little test with my bitcoind:

So it looks like bitcoind does not delete the .lock file when shutting down, and when coming up judging the lock somehow differently than by pure existence of the file. Reading through https://github.com/bitcoin/bitcoin/issues/19167#issuecomment-698594254 it looks like bitcoind is using fcntl which offers advanced file locks and locks are placed on the .lock file itself without ever changing/deleting it. Since the author proposed a switch to flock() and that's what I previously found too, can we look into using it?

michael1011 commented 3 years ago

One thing to keep in mind though when talking about low level file system calls @kilrau:

We are using Node. Which means we need to have native C++ bindings to be able to call those things which does not only add a dependency but also might not be compatible with every OS and file system. The first Node flock library I found when googling for such things was incompatible with BSD for example

kilrau commented 3 years ago

Maybe fcntl is the better choice then. But working on this PR is definitely P3 and we should only pick it up after MPP.

michael1011 commented 3 years ago

Maybe fcntl is the better choice then

May I remind you: we cannot use native file system calls without a native C++ (or Rust or whatever) dependency which might not be compatible with all file systems, operating systems and architectures

kilrau commented 3 years ago

Ok. Reading between the lines, you'd vouch for sticking with the current lockfile exists = locked approach?

michael1011 commented 3 years ago

I don't know what to suggest tbh. Both approaches have their up and downsides and we should consider carefully

kilrau commented 3 years ago

I don't know what to suggest tbh. Both approaches have their up and downsides and we should consider carefully

Agreed, it's not clear what the best way is right now and at least I have concerns of potentially making things worse with this PR (few will benefit from what the lockfile prevents, more might run into troubles not being able to start xud because lockfile exists).

I converted this PR to draft, we can pick it up again once a good solution evolves.

kilrau commented 3 years ago

Just dropping this here as an observation:

~$ sudo apt update
Reading package lists... Done
E: Could not get lock /var/lib/apt/lists/lock. It is held by process 2328 (packagekitd)
N: Be aware that removing the lock file is not a solution and may break your system.
E: Unable to lock directory /var/lib/apt/lists/
michael1011 commented 3 years ago

One thing that just came to my mind: what if we write the PID of the XUD process in the lockfile and check if the PID is still running if the lockfile exists?

kilrau commented 3 years ago

Hmm yeah that could work. Also should work on windows as per your link