fleetdm / fleet

Open-source platform for IT, security, and infrastructure teams. (Linux, macOS, Chrome, Windows, cloud, data center)
https://fleetdm.com
Other
2.92k stars 405 forks source link

Bug: Script timeout is currently not respected by Orbit which in turn does not respect the modified timeout. #20248

Open sharon-fdm opened 1 month ago

sharon-fdm commented 1 month ago

Fleet version: <!-- Copy this from the "My account" page in the Fleet UI, or run fleetctl --version -->

Web browser and operating system:


💥  Actual behavior

On macOS a script timeout doesn't take into effect until the existing command in the script completes. For example, running a script including sleep 10 with a timeout of 5s will timeout after 10s, not after 5s. To note, this does not occur on Windows devices, as the implementation there is slightly different.

The timeout partially works, as if the script execution time exceeds the timeout time, no futher commands in the script will execute. However the expected behavior is that orbit will kill the subcommand in progress as well.

🧑‍💻  Steps to reproduce

on macOS

🕯️ More info (optional)

lukeheath commented 1 month ago

@sharon-fdm Reminder to use bug template and populate as much information as possible in the body section.

noahtalerman commented 1 month ago

Hey @zayhanlon heads up that a fix for this bug is not targeted to ship in the next Fleet release (4.46)

After sprint planning today, we decided to prioritize the "Support Zero Trust workflow w/ live queries: 6 queries on 13k hosts" story (#17379) instead.

cc @sharon-fdm

getvictor commented 3 weeks ago

Waiting for @mostlikelee to provide details

sharon-fdm commented 1 week ago

@mostlikelee this is something we identified when working with the scripts. Could you please fill in the details on this. (I can't recall which OSs have this issue and in what conditions.)

mostlikelee commented 1 week ago

We may need some product input here, but the issue is that the script timeout doesn't take into effect until the existing command in the script completes. For example, running a script including sleep 10 with a timeout of 5s will timeout after 10s, not after 5s. This may be desired behavior, as killing a command before completion could render the device in an unstable state. However this does not occur on Windows devices, as the implementation there is slightly different. My findings were on macOS, and I suspect this also occurs on Linux.

@noahtalerman curious on your thoughts here as to when a script timeout should take effect.

mostlikelee commented 1 week ago

@nonpunctual @spokanemac curious your thought on this

spokanemac commented 1 week ago

@mostlikelee, I have no idea how this actually works. How I expect this to work is to capture the PID of the process, log a timestamp, and fork a bg process that sleep for X timeout. At the end of the timeout, see if PID exists, and kill it.

nonpunctual commented 1 week ago

@mostlikelee

As an admin, I should be able to:

That's all. I know that's not a direct answer but I think this is what Fleet admins who upload scripts expect. Thanks.

mostlikelee commented 6 days ago

@spokanemac @nonpunctual thanks for the feedback. I believe the script timeout config exists primarily to protect admins in case they accidentally write something in their scripts that take too long or never exits, like writing an infinite loop in their scripts or running a command that never ends. The current implementation should take care of the infinite loop, but will not protect against a command that never ends. Maybe that's a corner case we shouldn't be worrying about at the moment.

If we change the behavior to kill in progress commands, it has the possibility of being a footgun if the timeout is set to a low value and kills something important, like an OS update in progress. I'm guessing killing an OS update at the wrong time may render the device in a bad state.

mostlikelee commented 6 days ago

After speaking with @nonpunctual we believe the current behavior is indeed a bug. The bug details have been updated.