Open yuanmich2 opened 2 days ago
Thanks for reporting your issue with the worker agent.
It sounds like the --user
argument of install-deadline-worker
was misunderstood to be the user account that jobs will run as. In reality, the --user
argument specifies the user that the worker agent runs as. When you run install-deadline-worker --help
on a Windows host today, you will see:
--user USER The username of the AWS Deadline Cloud Worker Agent user. Defaults to "deadline-worker".
Perhaps there is room in that help text to elaborate further or make readers aware that queue users are the recommended setup.
While the worker agent user CAN also be used for running jobs, it is not recommended (see security best-practice documentation in the Deadline Cloud user guide):
- Don’t set the queue jobRunAsUser to the name of the OS user that the worker agent runs as.
The intended setup is documented in the "Worker host setup" topic of the Deadline Cloud user guide:
The Deadline Cloud worker agent should run as a dedicated agent-specific user on the host. You should configure the jobRunAsUser property of Deadline Cloud queues so that workers will run the queue jobs as a specific operating system user and group.
These are just best-practices and we recognize that there may be situations where people may choose to deviate from best-practices for business/technical reasons. We believe it is important to provide escape hatches for these use cases.
As you have noted, the security considerations are different on Windows where the worker agent user account must be an administrator. For this reason, we made this configuration only possible when the person setting up a Deadline Cloud queue takes explicit action to configure the queue to run as the agent user.
Today, there are two ways to setup the worker agent on a Windows host that can result in the jobs being run as an administrator user. I've provided some explanation on how the worker agent and Deadline Cloud must be configured to achieve this:
The operating system user that the worker agent program runs as must be an administrator. This is because Windows will only permit the creation of subprocesses that run as a different users if the originating process is running as an admin user.
The worker agent makes it difficult (but not impossible) to set up the worker to run jobs as this same agent user. The worker agent:
run_jobs_as_agent_user
configuration setting is true on WindowsjobRunAs
→ runAs
is set to WORKER_AGENT_USER
(ref docs) on Windowswindows_job_user
configuration points to the same user that the worker agent is running as (code ref)The only way to set up a farm to run jobs as the agent user is to configure the queue jobRunAsUser
→ windows
(ref docs) to refer to the same operating system user that the agent runs as. The user guide provides security best-practice documentation advising against this, but we wanted to provide an escape hatch for those who choose to accept this risk and need to run jobs as this same user.
Customers can configure their queue jobRunAsUser
→ windows
to point to a different operating system user that is an administrator. While this is less secure than a non-administrator user, we still want to leave an escape-hatch for customers to accomplish their goals. The "Worker hosts" section of the "Security Best Practices for Deadline Cloud" topic of the user guide states the best-practice here:
Grant queue users least-privileged OS permissions required for the intended queue workloads. Ensure that they don't have filesystem write permissions to work agent program files or other shared software.
With all of this context in mind, I'm wondering if you have any suggestions on what could be improved here. Is there a point in the onboarding journey (documentation / installer / console) where we could best intervene or guide people towards the proper setup? Keeping in mind we do want to keep an escape hatch for those who understand and accept the risks and have a justified need to run their jobs as an administrator.
Make the escape hatch obvious.
You actually did a good job of outlining a bunch of reasons why I think the solution is not obvious right now. For example, the run_jobs_as_agent_user config option doesn't actually allow you to run as the agent user, it just makes your job fail on Windows which isn't very helpful to anyone. That option causes the worker an error about admin permissions, but paradoxically if you had just specified an admin user with the --user
flag, you get no error. In my original post, I opened the escape hatch without knowing I was opening an escape hatch because I had agent and queue users confused.
A config option named something like "allow_admin_users_to_render" would go a long way. We could include it in the auto-generated config file with a big comment that explains the risk very clearly.
It might also be worth changing the flag from --user
to --agent-user
or something to prevent the confusion I had between agent and queue users in the future, but I think as long as such a confusion doesn't open up security holes, it's not that big of a deal.
Describe Behaviour
Right now, you can specify an admin queue user, which is a security risk. I accidentally fell into this trap when I accidentally specified the queue user as the agent user by passing it to the
--user
flag when running theinstall-deadline-worker
command.Expected Behaviour
You should have to deliberately want to render with admin privileges and accept the security risk before being allowed to do so.
Renders should throw an error if trying to run as an admin user. And if the queue user is the same as the agent user, it should point out that's the root cause of the error. The agent user is always forced to be admin, so the even if you weren't trying to run your render with admin privileges, but misunderstood the
--user
flag you'd end up setting the agent and queue user to be the same user and running with admin privileges.Current Behaviour
You just render with a bad security posture because you have admin privileges.
Reproduction Steps
Setup a worker. Use the
--user
and--grant-required-permissions
flags on the queue user when runninginstall-deadline-worker
Possible Solution
Add better warning and error messages to prevent accidentally setting it up this way like I did.
Have a config option that explicitly enables rendering as admin. We can put a comment in the config file that outlines the potential risk.
Package Version
deadline-worker-agent 0.27.4
Language Version
python 3.12
Dependencies
No response
Operating System
Windows
Other information
No response