Closed taylor-curran closed 9 months ago
Would love to see this implemented quickly if possible! Adding an example where this surfaces in case it's helpful:
Our kubernetes-based workflow consists of:
(a) a deployment that spins up a parent_flow
in a k8s pod, where
(b) the parent_flow
represents one batch of work, and kicks off multiple instances of child_flow
and waits for them to complete processing a set of entities
.
We set things up in this way in order to reach our desired configuration that allows the system to scale performantly while best utilizing resources (we efficiently use the resources allocated to the pods, and don't need to scale the nodes in the cluster an unreasonable amount to avoid kubernetes pod-per-node limits).
We avoided sending sensitive data in the parent_flow
parameters (by passing in essentially pointers to the sensitive data), to avoid things going over the wire when triggering the deployment. However, since each parent_flow
is responsible for triggering a child_flow
(that executes within the same pod), we decided that since the pod is local to our ecosystem, it would be safe to pass in resolved values (which includes sensitive strings) to child_flow
. To illustrate:
@flow
async def parent_flow(s3_path, entity_start, entity_end):
subflows = [
child_flow(
entity_id=s3_path[current_entity].get("id") # we do this to minimize s3 lookups
for current_entity in range(entity_start, entity_end)
]
await asyncio.gather(*subflows)
@flow
def child_flow(entity_id): # this is now getting visualized in the UI
do_some_work_with.submit(entity_id)
As shown in @taylor-curran's example above, we used to see entity_id
as **********
, and now see the sensitive value being unmasked. Changing the implementation here would actually have a bit of a drastic effect, since we would likely pay a significant overhead to use the pointer approach in the subflow.
Thanks for the issue @taylor-curran and the extra detail @jpidugu. Our strong recommendation is that any secret values should use a secret block. Although obfuscation may be a convenience, we do not want to give a false sense of security for something that does not offer the same level of security as a secret block does.
First check
Prefect Version
Describe the current behavior
Currently, in Prefect 2 with the upgrade to pydantic v2, SecretStr parameters are not being masked in the flow run parameters within the UI. This leads to the exposure of sensitive data in the UI, which can be a significant risk for users handling confidential information.
Currently only the subflow secret parameters are obscured.
Describe the proposed behavior
The proposed behavior is to implement a feature where SecretStr parameters are automatically masked in the flow run parameters UI. This masking should apply consistently across both pydantic v1 and v2. The goal is to prevent accidental exposure of sensitive data in the UI, enhancing the security and privacy of flow run parameters.
Reproduction steps to see the change of behavior from Pydantic V1 to V2:
With Pydantic V1
my_flow.py
Flow Run URL Parent Flow Sub Flow ❗ Value Obscured ✅
With Pydantic V2
my_flow.py
Flow Run URL Parent Flow
Sub Flow ❗ Value NOT obscured
Example Use
An example use case would be when a user is running a flow that requires sensitive information. In the current behavior, these sensitive details are displayed in plain text in the UI. The proposed change would mask these details, similar to how passwords are masked in form inputs, ensuring that sensitive information is not exposed to anyone viewing the flow run details.
Additional context
This masking feature for SecretStr parameters was observed as an unintended behavior in earlier versions but was found to be beneficial by many users.
Related PR:
11656