alda-lang / alda

A music programming language for musicians. :notes:
https://alda.io
Eclipse Public License 2.0
5.62k stars 289 forks source link

player processes get stuck #411

Closed fleimgruber closed 2 years ago

fleimgruber commented 2 years ago

player processes get stuck in "starting" state and no further playback is possible

๐Ÿž Bug report ๐Ÿž

Description

After one or two alda play -c ... the processes get stuck in the starting state and no further playback is possible.

Steps to Reproduce

  1. alda -v2 play -c "piano: c12 e g > c4"
  2. alda -v2 play -c "piano: c12 e g > c4"
  3. alda -v2 play -c "piano: c12 e g > c4"
  4. ...

Yes, but the number of stuck processes varies from 2-3.

Exact score does not matter, I tried with alda -v2 play -c "piano: c12 e g > c4" and alda play -c "midi-bass-pop: c6 d12 e6 g12~4"

Expected Behavior

Sound plays consistently after each call to alda play ...

Actual Behavior

$ alda ps
id      port    state   expiry  type
itk     44295   starting        6 minutes from now      player
ypb     37647   starting        7 minutes from now      player
zxc     41825   starting        5 minutes from now      player
$ alda -v2 play -c "piano: c12 e g > c4"
Mar 20 14:25:36 INF parser/parser.go:791 > Parsed input. filepath= took="373.503ยตs"
Mar 20 14:25:36 INF cmd/play.go:243 > Constructed score. took="45.406ยตs" updates=6
Starting player processes...
It looks like Alda is having trouble starting player processes in the
background. This could happen for a number of reasons.

To troubleshoot:

  โ€ข Run alda doctor and see if any of the health checks fail.

  โ€ข Run alda ps to see the state of any currently running player
    processes.

  โ€ข Look for error messages in:
      /home/fps/.cache/alda/logs/alda-player.log

  โ€ข Try running a player process in the foreground:
      alda-player -v run -p 27278

  โ€ข Try to make it play something:
      alda -v2 play -p 27278 -c "piano: c12 e g > c4"

Environment

Operating system and version:

GNU/Linux NixOS 21.11

Alda version:

$ alda version
alda 2.0.6
$ alda-player info
alda-player 2.0.6
log path: /home/fps/.cache/alda/logs

Health check:

$ alda doctor
OK  Parse source code
OK  Generate score model
OK  Ensure that there are no stale player processes
OK  Find an open port
OK  Send and receive OSC messages
OK  Locate alda-player executable on PATH
OK  Check alda-player version
OK  Spawn a player process
OK  Ping player process
OK  Play score
OK  Export score as MIDI
OK  Locate player logs
OK  Player logs show the ping was received
OK  Shut down player process
OK  Spawn a player on an unknown port
OK  Discover the player
OK  Ping the player
OK  Shut the player down
OK  Start a REPL server
nREPL server started on port 41669 on host localhost - nrepl://localhost:41669
OK  Find the REPL server
OK  Interact with the REPL server
OK  Shut down the REPL server

Logs:

 104   โ”‚ clp INFO  2022-03-20 14:29:20 Main.run:77 - Starting receiver, listening on port 38477...
 105   โ”‚ clp INFO  2022-03-20 14:29:20 MidiEngine.<init>:276 - Initializing MIDI sequencer...
 106   โ”‚ clp INFO  2022-03-20 14:29:20 MidiEngine.<init>:281 - Initializing MIDI synthesizer...
 107   โ”‚ clp INFO  2022-03-20 14:29:23 Main.run:90 - Starting player...
 108   โ”‚ clp DEBUG 2022-03-20 14:29:23 Parser.parse:238 - received ping
 109   โ”‚ clp DEBUG 2022-03-20 14:29:23 Player.invoke:346 - TRACK 1: startOffset is 0
 110   โ”‚ clp DEBUG 2022-03-20 14:29:23 Player.invoke:349 - eraBefore: 0; eraAfter: 0
 111   โ”‚ clp INFO  2022-03-20 14:29:23 Main.invoke:95 - Stopping receiver...
 112   โ”‚ tta INFO  2022-03-20 14:29:27 StateManager.cleanUpStaleStateFiles:72 - Cleaning up stale files in /home/fps/.cache/alda/state/players...
 113   โ”‚ tta INFO  2022-03-20 14:29:27 StateManager.cleanUpStaleStateFiles:72 - Cleaning up stale files in /home/fps/.cache/alda/state/repl-servers...
 114   โ”‚ tta INFO  2022-03-20 14:29:27 Main.run:77 - Starting receiver, listening on port 46593...
 115   โ”‚ tta INFO  2022-03-20 14:29:27 MidiEngine.<init>:276 - Initializing MIDI sequencer...
 116   โ”‚ tta INFO  2022-03-20 14:29:27 MidiEngine.<init>:281 - Initializing MIDI synthesizer...
 117   โ”‚ tta INFO  2022-03-20 14:29:30 Main.run:90 - Starting player...
 118   โ”‚ tta DEBUG 2022-03-20 14:29:30 Parser.parse:238 - received ping
 119   โ”‚ tta INFO  2022-03-20 14:29:30 Main.invoke:95 - Stopping receiver...
 120   โ”‚ tta INFO  2022-03-20 14:29:30 Main.invoke:97 - Stopping player...
daveyarwood commented 2 years ago

Hi @fleimgruber , thanks for reporting this!

From the logs you provided, it looks like a handful of player processes (itk, ypb and zxc) recorded that they were in the starting state and then they apparently died before they were able to log anything. :thinking: In the log, there are only entries for 2 player processes, clp and tta, and no error messages for those 2 processes.

I do think that Alda can self-correct to recover from a situation like this, but it takes 2 minutes. Every time you run an alda command, it looks for "stale" player state files (which means any files in ~/.cache/alda/state/players/<VERSION> that were last modified more than 2 minutes ago) and deletes them. Alda will then start new player processes to replace the stale/dead ones.

Would you mind running watch -n 0.25 alda ps and keep an eye on the output, and observe what happens when your player processes get stuck? If you wait 2 minutes, do the starting processes disappear from the alda ps output?

Another thing that might be helpful is to monitor the actual processes on your system by running watch -n 0.25 pgrep -af alda-player. If the number of player processes reported by alda ps is more than the number of java processes in the pgrep output, that would confirm my theory above about the player processes dying without cleaning up their state files.

fleimgruber commented 2 years ago

Thanks for taking a look at this!

watch -n 0.25 alda ps

Every 0.2s: alda ps

id      port    state   expiry  type
itk     44295   starting        4 hours ago     player
ypb     37647   starting        4 hours ago     player
zxc     41825   starting        4 hours ago     player

alda play --code "midi-bass-pop: c6 d12 e6 g12~4" does not change anything in alda ps and prints the mentioned

Starting player processes...
It looks like Alda is having trouble starting player processes in the
background. This could happen for a number of reasons.

To troubleshoot:

...

watch -n 0.25 pgrep -af alda-player

9411 java -XX:+UseG1GC -XX:MaxGCPauseMillis=100 -Xmx1024m -Xms256m -DlogPath=tmplog -jar /nix/store/04p5x5y0lrs4ppz9
9clnhx5ddkhvvv9q-alda-2.0.6/bin/.alda-player-wrapped run
9497 java -XX:+UseG1GC -XX:MaxGCPauseMillis=100 -Xmx1024m -Xms256m -DlogPath=tmplog -jar /nix/store/04p5x5y0lrs4ppz9
9clnhx5ddkhvvv9q-alda-2.0.6/bin/.alda-player-wrapped run
9499 java -XX:+UseG1GC -XX:MaxGCPauseMillis=100 -Xmx1024m -Xms256m -DlogPath=tmplog -jar /nix/store/04p5x5y0lrs4ppz9
9clnhx5ddkhvvv9q-alda-2.0.6/bin/.alda-player-wrapped run
fleimgruber commented 2 years ago

it looks for "stale" player state files (which means any files in ~/.cache/alda/state/players/ that were last modified more than 2 minutes ago) and deletes them.

The associated player state files seem to get updated somehow and thus not deleted by the logic you mentioned:

ls -la ~/.cache/alda/state/players/2.0.6/itk.json 
-rw-r--r-- 1 fps users 64 Mar 20 19:23 /home/fps/.cache/alda/state/players/2.0.6/itk.json
daveyarwood commented 2 years ago

Interesting! So it looks like those 3 processes are not dying, rather, they are recording that they're in the starting state and then apparently not getting past that point.

I'm not sure at the moment how that could happen :thinking:

Just to confirm: the first few times you ran alda play ..., the playback worked?

Another thing worth trying: run alda shutdown to send a "shutdown" message to each of the stuck player processes. I'm curious if that works. If it does, it means that the player processes are actually running, but they aren't updating their state files.

fleimgruber commented 2 years ago

Just to confirm: the first few times you ran alda play ..., the playback worked?

It is not reliable, sometimes the playback only works for one or two of those and subsequent ones are silent. Sometimes they work a couple of times (usually if I leave ample time between them).

Another thing worth trying: run alda shutdown to send a "shutdown" message to each of the stuck player processes.

alda shutdown does not shut down the "starting" processes.

It might have to do with NixOS, the way it wraps alda or some weird sound system interaction? See also https://github.com/NixOS/nixpkgs/blob/master/pkgs/development/interpreters/alda/default.nix. If we don't find a lead or a problem with alda itself, I will open an issue there as well just to confirm it is not a NixOS bug...

daveyarwood commented 2 years ago

alda shutdown does not shut down the "starting" processes.

Is there any error or warning message? Can you try it with alda -v3 shutdown?

It might have to do with NixOS, the way it wraps alda or some weird sound system interaction? See also https://github.com/NixOS/nixpkgs/blob/master/pkgs/development/interpreters/alda/default.nix. If we don't find a lead or a problem with alda itself, I will open an issue there as well just to confirm it is not a NixOS bug...

I don't really understand NixOS. It could very well be the case that something weird is happening involving the way the Alda Nix package is set up. I figure it wouldn't hurt to at least open an issue with Nix, link to this issue and see if they have any ideas.

fleimgruber commented 2 years ago

Is there any error or warning message? Can you try it with alda -v3 shutdown?

$ alda -v3 shutdown
Shutting down player processes.
Mar 21 18:42:26 INF cmd/shutdown.go:68 > Sent "shutdown" message to player process. player={"Expiry":1647885032161,"ID":"naj","Port":40575,"ReadError":null,"State":"starting"}
Mar 21 18:42:26 DBG cmd/root.go:53 > Waiting for background activity to complete. activity="clean up renamed executables"
$ alda ps
id      port    state   expiry  type
naj     40575   starting        7 minutes from now      player
daveyarwood commented 2 years ago

OK, so that looks like the client successfully sent the player the "shutdown" message, which means the player was listening and received the message. :thinking:

But then the player didn't shut down, and it remained in the starting state?

fleimgruber commented 2 years ago

But then the player didn't shut down, and it remained in the starting state?

Yep:

$ alda ps
id      port    state   expiry  type
naj     40575   starting        5 minutes from now      player

Might this give a hint activity="clean up renamed executables" combined with NixOS handling of binary paths (basically symlinks to derivations, which roughly correspond to a system installation state, comparable to a git commit)?

daveyarwood commented 2 years ago

That part is unrelated. It's an implementation detail of the way that alda update works -- it renames the current executable (old version) of Alda and downloads the new version in its place. Then the next time you run Alda, it deletes the old, renamed executables.

fleimgruber commented 2 years ago

I noticed another thing now: When I let it sit for a while, the timer counts backwards and beyond, for example:

$ alda ps
id      port    state   expiry  type
naj     40575   starting        50 seconds ago      player

when a new shutdown is sent afterwards via alda -v3 shutdown (the command output is the same) then the timer is reset to 8 minutes:

$ alda ps
id      port    state   expiry  type
naj     40575   starting        8 minutes from now      player

maybe this gives an indication of which code path was hit.

daveyarwood commented 2 years ago

Good find! Every time a player process receives any message, it starts the timer over. This is an inactivity timer designed to prevent player processes from running forever in the background and wasting your CPU.

Once the timer reaches 0, the process should shut itself down and when you run alda ps again, you shouldn't see it there.

It seems like for some reason, the player process is having trouble getting past the starting state and is not responding to "shutdown" messages. :thinking:

fleimgruber commented 2 years ago

I tried updating to 2.2.0 and can't reproduce the issue there. I sent a PR to NixOS upstream so this might have been solved in newer alda along the way already.

daveyarwood commented 2 years ago

Weird, but I'm happy that the issue is fixed!