dqsjysgs commented 3 years ago

🐞 Bug report 🐞

Description

I tried to use alda doctor to check if alda works, but an error occurred. It shows that dial tcp 127.0.0.1:31726: connectex: No connection could be made because the target machine actively refused it. It happends sometimes, not always, which makes me very confused. I think it might be a bug.

Environment

Operating system and version: Microsoft Windows [version 10.0.18362.1016]

Alda version:

PS C:\Users\zbl> alda version
alda 2.0.1

PS C:\Users\zbl> alda-player info
alda-player 2.0.1
log path: C:\Users\zbl\AppData\Local\alda\cache\logs

Health check:

OK  Parse source code
OK  Generate score model
OK  Find an open port
OK  Send and receive OSC messages
OK  Locate alda-player executable on PATH
OK  Check alda-player version
OK  Spawn a player process
OK  Ping player process
OK  Play score
OK  Export score as MIDI
OK  Locate player logs
OK  Player logs show the ping was received
OK  Shut down player process
OK  Spawn a player on an unknown port
OK  Discover the player
ERR Ping the player

---

Oops! Something went wrong:
  dial tcp 127.0.0.1:31726: connectex: No connection could be made because the target machine actively refused it.

This might be a bug. For help, consider filing an issue at:
  https://github.com/alda-lang/alda/issues/new/choose
Or come chat with us on Slack:
  https://slack.alda.io

daveyarwood commented 3 years ago

Hi @dqsjysgs, thanks for reporting this. I'm not really sure what the issue is yet, but it seems related to issues one or two other people have reported where alda doctor results in an error towards the end, sometimes sporadically.

I'm wondering if maybe we could do a better job of handling unexpected failure to connect with player processes. Maybe we could attempt to connect to a different available player process if it fails the first time, or something like that. I'll think about this for a while and see if I can improve the experience here.

UlyssesZh commented 3 years ago

I encountered the same bug on my Windows 10. Not sure how to reproduce.

UlyssesZh commented 3 years ago

Hi @daveyarwood but could you tell us how to work around this bug? I cannot get rid of it unless I restart my computer.

daveyarwood commented 3 years ago

What problems are you running into, exactly?

If the issue is that a player process gets into a "bad state" and you can't connect to it, one thing you can do is run alda shutdown, which causes all of the player processes to exit. Then, the next time you run any alda command after that (e.g. alda, alda --help, alda play, etc.), new player processes will be spawned.

UlyssesZh commented 3 years ago

If the issue is that a player process gets into a "bad state" and you can't connect to it, one thing you can do is run alda shutdown...

When this bug occurs, all of the following will produce the same error:

alda play etc.
alda shutdown
alda doctor

Fortunately, I found my workaround:

Run alda-player run in a seperate terminal and remember the port;
Do not run alda doctor, which will shut down the player process that we have just run;
Use --port to specify the port of the player process, and everything becomes fine.

daveyarwood commented 3 years ago

That sounds like a good workaround.

Another thing you can do, if you're on Mac or Linux, is to use ps output to find the player processes and forcibly kill them:

$ ps aux | grep alda-player | grep -v grep
dave     1608025  127  3.8 5731260 602708 pts/0  Sl   10:05   0:13 java -XX:+UseG1GC -XX:MaxGCPauseMillis=100 -Xmx1024m -Xms256m -DlogPath=tmplog -jar /home/dave/bin/alda-player run
dave     1608026  134  3.7 5864392 592860 pts/0  Sl   10:05   0:14 java -XX:+UseG1GC -XX:MaxGCPauseMillis=100 -Xmx1024m -Xms256m -DlogPath=tmplog -jar /home/dave/bin/alda-player run
dave     1608027  124  3.7 5731256 583376 pts/0  Sl   10:05   0:13 java -XX:+UseG1GC -XX:MaxGCPauseMillis=100 -Xmx1024m -Xms256m -DlogPath=tmplog -jar /home/dave/bin/alda-player run

$ kill -9 1608205 1608026 1608027

With the caveat that if you do this, it puts your alda ps into a weird state for a while. It will eventually correct itself, but if you want to fix it right away, you can blow away the cache directory by running rm -rf ~/.cache/alda/state and Alda will reconstruct it correctly on the next run.

That last comment was really helpful for me. I was able to reproduce part of the issue by doing the kill -9 thing above and getting my cache directory into a weird state, then running either alda play or alda shutdown. Both commands produce an error like this:

$ alda shutdown
Oops! Something went wrong:
  dial tcp 127.0.0.1:46313: connect: connection refused

This might be a bug. For help, consider filing an issue at:
  https://github.com/alda-lang/alda/issues/new/choose

Or come chat with us on Slack:
  https://slack.alda.io

At a minimum, we should improve the error message to include useful information about why this might have happened and what you can do in order to proceed. In the case of alda shutdown, I think we should just print a short warning about each player process that we weren't able to connect to and explain that the player process may already have stopped, and if it isn't, it will eventually stop itself due to inactivity.

I will also look into having all of these commands (anything that involves sending messages to a player process) recover better from player processes that cannot be reached by e.g. spawning new player processes and trying again.

PeiMu commented 3 years ago

I also met this problem with java-11 on Windows-10. Then it's all fine when I reset the java version back to java-8.

Not sure wether this info would be help..

dqsjysgs commented 3 years ago

If the issue is that a player process gets into a "bad state" and you can't connect to it, one thing you can do is run alda shutdown...

When this bug occurs, all of the following will produce the same error:

alda play etc.

alda shutdown

alda doctor

Fortunately, I found my workaround:

Run alda-player run in a seperate terminal and remember the port;

Do not run alda doctor, which will shut down the player process that we have just run;

Use --port to specify the port of the player process, and everything becomes fine.

I found a solution to this problem on my computer. Just delete all .json files under the following directory. C:\Users\xxx\AppData\Local\alda\cache\state\players\2.0.1 Then all of the following will be ok!

alda shutdown
alda doctor
alda repl

Obviously I don't know much, but it works to me, and hope it's useful to you, too.

daveyarwood commented 3 years ago

That workaround is in line with my understanding of the problem. Normally, player processes delete their own state files (those .json files that were just mentioned) when they exit, but if they die unexpectedly, those (now stale) files can hang around for a long time and cause problems.

Player processes also clean up any stale state files when they start. However, if the alda client thinks that there are already a handful of good player processes available, then it won't start any more players, and so you end up stuck in a situation where there are no player processes available, but the state in the cache directory incorrectly says that there are, and nothing ever fixes the stale state in the cache directory.

My plan is to make it so that the alda client also cleans up stale player state .json files when it starts, to ensure that the information in the state cache directory is correct before we attempt to use a player process.

daveyarwood commented 3 years ago

@dqsjysgs @ds1231h @UlyssesZh I've just released Alda 2.0.2, which I believe fixes this issue. After running alda update, could you please give it a try and let me know if it seems any better or worse?

dqsjysgs commented 3 years ago

@daveyarwood Not too bad and not too good. There is no problem with normal use. But when I started a terminal to use alda repl and then I deliberately closed the terminal. And then when I open a terminal to use alda once again, some exception happens. The alda doctor command doesn't work well.

OK  Parse source code
OK  Generate score model
OK  Find an open port
OK  Send and receive OSC messages
OK  Locate alda-player executable on PATH
OK  Check alda-player version
OK  Spawn a player process
OK  Ping player process
OK  Play score
OK  Export score as MIDI
OK  Locate player logs
OK  Player logs show the ping was received
OK  Shut down player process
OK  Spawn a player on an unknown port
OK  Discover the player
ERR Ping the player

---

Oops! Something went wrong:
  dial tcp 127.0.0.1:1126: connectex: No connection could be made because the target machine actively refused it.

This might be a bug. For help, consider filing an issue at:
  https://github.com/alda-lang/alda/issues/new/choose

Or come chat with us on Slack:
  https://slack.alda.io

At this time, alda shutdown command also doesn't work.

Oops! Something went wrong:
  dial tcp 127.0.0.1:1126: connectex: No connection could be made because the target machine actively refused it.

This might be a bug. For help, consider filing an issue at:
  https://github.com/alda-lang/alda/issues/new/choose

Or come chat with us on Slack:
  https://slack.alda.io

alda repl sometimes can work, sometimes can't, which is a little strange. I open the task manager, the java process does not end. I have to close java process by hand. After I delete all json files under C:\Users\zbl\AppData\Local\alda\cache\state\players\2.0.2, everything is ok.

daveyarwood commented 3 years ago

OK, I think I still have a pretty good idea about what is happening.

In Alda 2.0.2, I made it so that Alda cleans up stale files in the cache/state/players directory more aggressively, and I'm defining "stale" as "at least 2 minutes old." (Prior to that release, it was 10 minutes.)

I could set the threshold lower than 2 minutes, but I'm worried that there might be bad behavior if I set it too low.

Another thing we can do here is to make it so that if we can't connect to a player process (e.g. because the state file is old and the player isn't running anymore), we retry with a different player process. I'll try that next.

@dqsjysgs Do you see anything interesting (error messages, stacktraces, etc.) in the player logs when this happens? From your initial post above, it looks like the directory to check in is C:\Users\zbl\AppData\Local\alda\cache\logs.

daveyarwood commented 3 years ago

I've just released Alda 2.0.4, with some improvements to the way player processes are located and used. When there is a situation like what we described above, where the Alda client cannot reach an old player process (e.g. one that mysteriously died, leaving behind a stale state file that hasn't been cleaned up yet), it will now clean up the stale state file and move on to try a different player process, spawning new players if needed.

Give the new version a try when you have a chance, and let me know what you think!

dqsjysgs commented 3 years ago

@daveyarwood Thanks a lot for your efforts! It seems everything works well and it may be the time to close the issue.

daveyarwood commented 3 years ago

That's great to hear! I'll close this issue, but do let me know (either here or by opening a new issue) if anyone continues to have issues like the ones above!

alda-lang / alda

No connection could be made because the target machine actively refused it. #369

🐞 Bug report 🐞

Description

Environment