atuinsh / atuin

✨ Magical shell history
https://atuin.sh
MIT License
20.67k stars 562 forks source link

Pool timed out while waiting for an open connection, ZFS #952

Closed happenslol closed 1 year ago

happenslol commented 1 year ago

Edit:

This has become the canonical issue for Atuin/ZFS issues

If you're using ZFS with Atuin, you have likely noticed an error such as the following:

Error: pool timed out while waiting for an open connection

Location:
    /home/runner/work/atuin/atuin/crates/atuin-client/src/record/sqlite_store.rs:48:20

This is due to an issue with ZFS and SQLite. See: https://github.com/openzfs/zfs/issues/14290

There are two workarounds

  1. Use the Atuin daemon

This has not yet been released as stable, however is mostly without issue. The daemon takes all SQLite writes off of the hot path, therefore avoiding the issue.

Follow the steps here: https://github.com/atuinsh/atuin/issues/952#issuecomment-2121671620

  1. Create an ext4 zvol for Atuin

Follow the following steps: https://github.com/atuinsh/atuin/issues/952#issuecomment-1902164562


I've just begun using atuin, and I absolutely love it so far. However, there's been a recurring issue for me, which I've found hard to diagnose:

My prompt regularly blocks for between 500ms to 5s whenever I run a command. I've narrowed this down to the _atuin_preexec function, by manually importing the shell hook generated from atuin init zsh and annotating it with logging and time calls. Here's a sample time call from a time where it hang:

Running pre-exec for cd ~

0.00user 0.00system 0:04.93elapsed 0%CPU (0avgtext+0avgdata 8192maxresident)k
52036inputs+1064outputs (15major+512minor)pagefaults 0swaps

Pre-exec done for cd ~

Here's how I modified the hook to get the result:

_atuin_preexec() {
    log "Running pre-exec for $1\n" >> /tmp/atuin.log
    local id
    id=$(/usr/bin/time -a -o /tmp/atuin.log atuin history start -- "$1")
    export ATUIN_HISTORY_ID="$id"
    echo "\nPre-exec done for $1" >> /tmp/atuin.log
}

I've tried to replicate the behavior in cli use outside of the hook using hyperfine, and was successful:

» hyperfine -r 1000 "atuin search --limit 5"
Benchmark 1: atuin search --limit 5
  Time (mean ± σ):      18.3 ms ± 114.8 ms    [User: 4.9 ms, System: 8.2 ms]
  Range (min … max):    12.5 ms … 2587.9 ms    1000 runs

This does not happen on every benchmark, even with 1000 runs. My initial thought was that this has to be contention on the database file, but I saw that you're already using WAL, so concurrent writes/reads should not be a problem. I can also trigger the delay by repeatedly opening the search widget, which should not even be doing writes to the database, which confuses me even more.

Do you have any idea on how I could gather further data on this?

Mic92 commented 4 months ago

Reading the documentation to me it reads that at some point atuin will be able to autostart itself as a daemon, so in that case you wouldn't need to starting daemons yourself anymore - also you can probably put this in your shell configuration as of today already. With autostart, it would be nice if atuin could also autostop itself. This might be a bit tricky to do race condition free. The best way I know is using socket socket activation as supported in systemd on Linux and launchd on macOS. Here is how it works:

  1. The supervisor* binds to the atuin socket
  2. When we have a client, the supervisor will start the atuin daemon
  3. The atuin daemon will stop after a timeout when there are no more requests coming in

The crucial thing is that the supervisor keeps the socket open at all time and connection attempts are cached until the daemon responds, which has the advantage that no client will ever run into the case where a socket is not responsive. Autostopping also gives us a nice way to upgrade atuin when there is a new version around.

*systemd or launchd

UPDATE: nice systemd socket activation is already implemented: https://github.com/atuinsh/atuin/pull/2039

ellie commented 3 months ago

@hvisage I'm afraid that your requirements aren't really something we can support. If OpenZFS fixes their bug, then all should be fine - but until that point, some sort of workaround is required.

so it does become a, challenge with the daemon popping up everywhere now

Could you elaborate a bit here? We technically already run a short-lived background process on every single command, the daemon is just formalizing that a little. I'm not too sure how it's now "popping up everywhere".

If there is a way to force a PRAGMA=asyncronous/unsafe type to not have the synchronous writes (and here I will content that shell history is an assistance, not a requirement, besides the idea for me is much more to get the sync to a server going, there it'll make sense/simple to have ext4 on it)

If blocking for 5s, running the daemon, or creating a zvol is unacceptable - you could patch the code.

It's a one/two line change, many examples in this thread. I'm not willing to have any combination of options that leads to corrupt databases and data loss though so you will have to maintain that patch. Note that the risk there isn't that the database vanishes and resyncs from time to time, but more that it will just halt and error out until you manually delete it.

boozedog commented 3 months ago

Excited for the new daemon so I can start using Atuin again! 👍

It's out now as part of v18.3.0! Still experimental, but safe to use

Docs: https://docs.atuin.sh/reference/daemon/

Testing it now and it's working great so far!

BTW for others who are using NixOS I found this forum post very helpful in setting up the daemon: https://forum.atuin.sh/t/getting-the-daemon-working-on-nixos/334/3

And then I enabled the atuin daemon setting in home-manager like this:

programs.atuin.settings = {
  daemon = {
    enabled = true;
  };
};
hvisage commented 3 months ago

@hvisage I'm afraid that your requirements aren't really something we can support. If OpenZFS fixes their bug, then all should be fine - but until that point, some sort of workaround is required.

;(

so it does become a, challenge with the daemon popping up everywhere now

Could you elaborate a bit here? We technically already run a short-lived background process on every single command, the daemon is just formalizing that a little. I'm not too sure how it's now "popping up everywhere".

I would need a deamon for each user using it, and then also for each service account (like root/mendix/postgres/etc.) when I do switch users - yes, it had been a great help for things in the various user and service accounts... just, I use ZFS everywhere (other than my desktop/laptop that's macOS) ;(

If blocking for 5s, running the daemon, or creating a zvol is unacceptable - you could patch the code.

everytime I type, and then in between searches, it's actually becoming unusable where I don't expect it to be unusable ;(

Okay.

Let me just say: Ellie Thank you, it is a GREAT TOOL! (where there is no ZFS)

I'll have to deal/find a way to handle this challenge when I have time available ;(

Thank you otherwise for attempting to find work arounds, I might have a more unique setup/style, so will not bother further on this issue.

Finkregh commented 3 months ago

Sorta unrelated to this issue: YES! The quality of communication and willingness to look at peoples' interesting setups is really something.

atuin-bot commented 3 months ago

This issue has been mentioned on Atuin Community. There might be relevant details there:

https://forum.atuin.sh/t/sync-v2-testing/124/35

Dronakurl commented 1 month ago

I have moved the db_path in the config atui webite to my ramdisk. Another workaround would be another non-ZFS-parition. Question: When I shut down my computer, the ramdisk is deleted. Will I loose my long term history or just the last hour?