elves / elvish

Powerful scripting language & versatile interactive shell
https://elv.sh/
BSD 2-Clause "Simplified" License
5.52k stars 296 forks source link

Race condition when starting multiple instances simultaneously #1806

Open hanche opened 1 month ago

hanche commented 1 month ago

What happened, and what did you expect to happen?

I had to reboot my computer today. (A macbook pro.) I had iTerm running with an unusually high number (more than ten) of open sessions, all running elvish. After the boot, all but two of the elvish shells had no access to command history. Further examination revealed that two of the shells were running a daemon subprocess.

This indicates that there is a race condition: Each shell needs to decide whether a daemon is already running, and if not, start one itself. When many shells open at nearly the same time, this obviously invites a race to happen.

This has never happened to me before, despite frequently rebooting without quitting iTerm first. However, after a cursory look at pkg/daemon/activate.go, I don't see any obvious (to me) code for avoiding a race condition. Perhaps something could be implemented using a file lock?

(Edit: Minor misspelling.)

Output of "elvish -version"

0.21.0-dev.0.20240320152034-dfe675a0b467

Code of Conduct

krader1961 commented 1 month ago

This situation happened to me many times when I starting using Elvish. I had to add a randomized "sleep" command to my Elvish program that started Elvish on my Tmux sessions to minimize being affected by this race to start the Elvish daemon. This scenario is one of the reasons I am working on replacing the existing BoltDB history mechanism with a simple flat-file store. A flat-file history store has its own race conditions but does not include a race involving launching a new process. It is therefore easier to reason about and avoid races as described in the above problem statement.

krader1961 commented 1 month ago

To clarify my previous comment... I (re)wrote an Elvish program that initialized multiple Tmux sessions in parallel. The program was originally written as a POSIX script but that isn't really relevant. What is relevant is that both programs launched multiple interactive Elvish programs on the same system more or less simultaneously. Which caused those interactive Elvish shells to race creating the interactive daemon.