Closed nathan5280 closed 1 year ago
I made a stab at implementing this. https://github.com/PrefectHQ/prefect/pull/2048
I made some good progress on this throughout the week. I think we have a pretty good working prototype now. See pull request: https://github.com/PrefectHQ/prefect/pull/2129
In the end I decided to add an additional handler to CloudHandler. This handler is called at the beginning of the CloudHandler.emit(). It checks to see if the log record is from the prefect.Task logger. These are the interesting messages. All the others are just status updates on state changes and don't contain anything of interest. The example handler writes the log message to the local file system and then replaces the record.message with the URI to that log file. At this point when the CloudHandler picks back up it has a safe log record to process as usual.
Working towards the requirements we have been discussing:
A global setting can obfuscate all task logs and exception traces from Cloud; existing prefect code can be instantly configured for compliance without changing any logger.* calls
This is pretty well covered in this prototype. Because the local log handler only runs when the message is going to the cloud and runs before the CloudHandler does anything, the handler has full control of how the record is processed, transformed, and cleaned. The developers can put any tagging they want in the record.message or extras and process them how they see fit. The handler is a standard subclass of logging.Handler with the exception that it mutates the log record. Currently, CloudHandler makes a deep copy of this record before it calls the local handler to make sure that any down stream handlers aren't impacted.
It isn't quite one config to rule them all, but it fits ok for the prototype as follows.
[logging]
[logging.local]
# Pre-cloud logging handler
handler_class = "prefect.utilities.logging_local.LocalHandler"
root_dir = "/tmp/logs"
Partial log messages or exception traces can be individually configured to emit unobfuscated to Cloud; users can pick and choose partial parts of a single log or exception that are safe to send to Cloud**
The local handler fully processes the log record before the CloudHandler does anything significant with it. The prototype only processes and transforms the prefect.Task logger's messages. This lets all the status messages flow through unchanged to Prefect Cloud. This filter could be anything that is available in the log records context and data.
I think this is now covered as the local logger is configured when the CloudHandler is configured. This should work in all deployment environments.
The persisted logs are in a structure that makes discovery and clean up easy, for example following a naming convention like
The prototype uses:
<root-dir>/<flow-name>/<schedule-start-time>/<date>.<full-task-name>.log
This makes it easy to see all runs associated with a flow sorted by the time. It also makes it easy to delete old flows.
Whenever a log message or exception is configured to be obfuscated, Cloud/Cloud UI stores and renders only a reference to where the log is persisted (which is likely a filename or URI)
Yes
The technical solution utilizes the logging subsystem's abstractions as much as possible
Yes with the exception of the local handler mutating the log record. Otherwise it is a standard handler, filter and formatter as the developer requires.
The error log tile functionality in Cloud UI is not broken; it can still report number of errors and link to obfuscated error logs in Cloud UI like it does today
Yes
Authenticated UI users can view the unobfuscated logs in the UI if they can access the same service the logs are hosted on.
Some work to do here on the Prefect Cloud side.
Example of what is showing up in the Prefect Cloud Log:
March 8th 2020 at 1:41:50pm MDT | prefect.CloudTaskRunner
DEBUG
Task 'local_log_1': Calling task.run() method...
March 8th 2020 at 1:41:50pm MDT | prefect.Task: local_log_1
INFO
file:///tmp/logs/local-logging/2020-03-08T19.41.40.503021.00.00/2020-03-08T19.41.48.925210.00.00-local_log_1.log
Example of local log file:
SSN: 123-45-6789, Phone: 555-555-1212
Files of interest:
This issue is stale because it has been open 30 days with no activity. To keep this issue open remove stale label or comment.
Use Case
Businesses that deal with customer, medical or otherwise sensitive data are concerned about sending that data outside of their controlled environment. This includes log messages being sent to Prefect Cloud.
Solution
Similar to pluggable ResultHandlers it would be helpful to have pluggable LogHandlers that worked in conjunction with the CloudHandler to store the private parts of the log message in the tenants preferred local storage and replaced the log message sent to Prefect Cloud with a link to the local data.
Alternatives
Latest Thinking 2020/02/16 AM