bugy / script-server

Web UI for your scripts with execution management
Other
1.52k stars 244 forks source link

Why does "Stop"-Button not work? #624

Open Obaq-web opened 1 year ago

Obaq-web commented 1 year ago

Stopping a script with "stop"-button does not stop my scripts. Terminal says "Stopped by User" but the script keeps on running. I see a "kill"-button where before was the "stop"-button. But the "kill"-button is gray an can not be clicked. Can anyone help?

bugy commented 1 year ago

Hi @Obaq-web your script ignores SIGTERM command. Stop just sends a signal to a process, that it should finish gracefully. If it doesn't work, you can use kill. It should be available after 5-10 sec

Obaq-web commented 1 year ago

Hello bugy, I don't mind stopping my scripts with "kill". However the button stays gray, no matter how long I wait. Any ideas how I can fix this?

bugy commented 1 year ago

Probably it's a big in a new version. Which version are you using?

Obaq-web commented 1 year ago

Version: 1.17.1

I'll test this issue with an older version soon.

Stjefan commented 1 year ago

I have a similar problem (button stays gray), but in the log I can see that the corresponding kill POST request is sent. However the task is not killed. The problem in my case is that I am running a subtask in the corresponding script. As I using windows it must be killed via the corresponding kill command. I guess the problem is that the check in ExecutionService.py: if execution_id in self._executors: self._executors[execution_id].kill()

is false (which is correct for the main task but not for the subtask).

Maybe someone knows a workaround for that windows specific problem.

bugy commented 1 year ago

Hi @Stjefan the button shouldn't stay gray, that's the main issue.

The problem with windows is that there is no easy python way to gracefully kill child processes. However forceful kill (which should be available, if a button is enabled), should work even for windows and kill child processes as well.

Could you try to run a script on windows, and kill it via this command: taskkill /T /PID process_pid

And check, if the children are killed as well

bugy commented 1 year ago

Hi @Obaq-web @Stjefan, I checked on the latest dev version, and kill button works fine for me. It's grey indeed, but it's clickable. I will change the button's color, to be more verbose, that it's active. But could you confirm, that it's not clickable for you? Also, after you click "stop", do you see a timer on a button, once you click "Stop"?

Stjefan commented 1 year ago

I changed my machine for a different reason and now its working fine. I will check it on the other machine soon. FYI: When it was not working, "Stop" was clickable, then the timer appeared and then the "Kill" button appeared. "Kill" was clickable (saw the POST request on the backend log), but there was no visible response to the click at the gui. taskkill also worked fine but somehow the server falsely thought that it already stopped the script and did not execute the taskkill command. In the script I wrote timestamps to a file, which continued after clicking stop, so I verified that the task was still running.

bugy commented 1 year ago

Hi @Stjefan could you share the script with me? Which I could use to reproduce the issue on my machine? If it would be some demo script, not related to your work, that will be more than enough for me.

Stjefan commented 1 year ago

Sure, it's a very simple script:

from time import sleep
from datetime import datetime
import sys

print("This is the name of the script: ", sys.argv[0])
print("Number of arguments: ", len(sys.argv))
print("The arguments are: " , sys.argv)

sleep(1)

if __name__ == '__main__':

  while True:
    sleep(2)
    print("doing something in a loop ...")
    with open('somefile.txt', 'a') as the_file:
        the_file.write(f'{datetime.now()}\n')

  print("End of the program. I was killed gracefully :)")

I finally found the difference between my two machines. On my previous machine I started the script via the path and let Windows decide how to intepret the .py file. Then stopping does not work. On my current machine, I start the script via 'py path/2/file'. Then stopping works fine. When I use 'py path/2/file' on the previous machine stopping works as well. So it should be a problem around the default program thats used to start .py files and not your great code :).

bugy commented 1 year ago

Hi @Stjefan thanks a lot I think it could be still improved in script server. If stop button is there, it should work for all the cases :)

By the way, regarding:

taskkill also worked fine but somehow the server falsely thought that it already stopped the script and did not execute the taskkill command.

So you executed taskkill manually, but the real process didn't stop, right? However, script server considered this one as stopped. Is it correct understanding? \t flag was supposed to kill all the child processes as well :( According to their docs

Stjefan commented 1 year ago

No taskkill worked as expected and \t killed the subtasks as well. But script-server checks the following before running the taskkill: if execution_id in self._executors: (see in ExecutionService.py) And this condition is false in my case. So the taskkill command is never invoked.

bugy commented 1 year ago

Hi @Stjefan

if execution_id in self._executors: should always return true, because elements are never removed from self._executors (only on server restart)

jost-balent commented 1 year ago

Hy everybody, I have a similar problem with the STOP button on v1.16.0. I am running this bash script:

!/bin/bash

container_id=$(docker run -d --rm alpine sh -c 'for i in $(seq 1 100); do echo "$i"; sleep 1; done') docker attach "$container_id"

If I run this script manually and press ctrl+c (SIGINT), the docker container is also stopped and everything terminates as expected. If I run the same script via bugy server, pressing the STOP button simply outputs >> STOPPED BY USER and the script is still running and the output is still being displayed. After a countdown of 5s I can press the KILL button (which is normal for the bugy server). The output of the script stops showing anything and the text " >> KILLED " is seen in the output, as expected. The problem is, the child process (in this case docker run) is still running on the server. It seems as if the child processes aren't killed by the KILL button properly but the triggering script has been properly terminated. This is the output with annotations: image

Thank you for your time and consideration, help would be greatly appreciated. Best regards,

Jost

bugy commented 11 months ago

Hi @jost-balent sorry, I missed your question :( I'll leave an answer anyway. There are multiple things here:

To sum it up: this could be fixed in script server by sending SIGINT instead of SIGTERM. However, I think this problem is a quite rare use case, and could be worked around by using different docker commands.

If there are more people experiencing this problem, please let me know