PrefectHQ / prefect

Prefect is a workflow orchestration framework for building resilient data pipelines in Python.
https://prefect.io
Apache License 2.0
15.78k stars 1.54k forks source link

Suppress "PID ..." log lines? #13079

Open sckott opened 1 year ago

sckott commented 1 year ago

Expectation / Proposal

We're using the ShellOperation class from this library for all of our Prefect tasks. It works great. However, some of our team would like to suppress the log lines like PID <some pid> stream output:. Is there any way to suppress those lines in prefect logs? See screenshot below.

Perhaps there’s an env var that can be set? Perhaps it can be achieved via logging config?

If this is the wrong place for this ticket let me know!

Traceback / Example

Screen Shot 2023-03-01 at 9 48 40 AM

zanieb commented 1 year ago

@sckott do you just mean toggling stream_output to False? https://github.com/PrefectHQ/prefect-shell/blob/main/prefect_shell/commands.py#L234

sckott commented 1 year ago

Thanks @madkinsz - That does suppress the "PID ..." lines, but also all logs I think from the task. Here's I think a reproducible example:

test.py

from prefect import flow, task
from prefect_shell import ShellOperation

@task
def my_task(x, stream=True):
    return ShellOperation(
        commands=[
            f"echo the number is {x}",
            f"echo I have {x} apples",
        ],
        stream_output=stream
    ).run()

@flow(name="Test")
def test():
    a = my_task(x=5, stream=True)
    b = my_task(x=7, stream=False)

    return a, b

if __name__ == "__main__":
    test()

Running python3.11 -m test gives me:

08:59:12.165 | INFO    | prefect.engine - Created flow run 'ultra-crab' for flow 'Test'
08:59:12.288 | INFO    | Flow run 'ultra-crab' - Created task run 'my_task-0' for task 'my_task'
08:59:12.289 | INFO    | Flow run 'ultra-crab' - Executing 'my_task-0' immediately...
08:59:12.310 | INFO    | Task run 'my_task-0' - PID 92133 triggered with 2 commands running inside the '.' directory.
08:59:12.313 | INFO    | Task run 'my_task-0' - PID 92133 stream output:
the number is 5
I have 5 apples
08:59:12.313 | INFO    | Task run 'my_task-0' - PID 92133 completed with return code 0.
08:59:12.324 | INFO    | Task run 'my_task-0' - Finished in state Completed()
08:59:12.337 | INFO    | Flow run 'ultra-crab' - Created task run 'my_task-1' for task 'my_task'
08:59:12.338 | INFO    | Flow run 'ultra-crab' - Executing 'my_task-1' immediately...
08:59:12.356 | INFO    | Task run 'my_task-1' - PID 92134 triggered with 2 commands running inside the '.' directory.
08:59:12.358 | INFO    | Task run 'my_task-1' - PID 92134 completed with return code 0.
08:59:12.368 | INFO    | Task run 'my_task-1' - Finished in state Completed()
08:59:12.377 | INFO    | Flow run 'ultra-crab' - Finished in state Completed()

The behavior I'm looking for is to see the logs from the echo statements still, like:

08:59:12.165 | INFO    | prefect.engine - Created flow run 'ultra-crab' for flow 'Test'
08:59:12.288 | INFO    | Flow run 'ultra-crab' - Created task run 'my_task-0' for task 'my_task'
08:59:12.289 | INFO    | Flow run 'ultra-crab' - Executing 'my_task-0' immediately...
08:59:12.310 | INFO    | Task run 'my_task-0' - PID 92133 triggered with 2 commands running inside the '.' directory.
08:59:12.313 | INFO    | Task run 'my_task-0' - PID 92133 stream output:
the number is 5
I have 5 apples
08:59:12.313 | INFO    | Task run 'my_task-0' - PID 92133 completed with return code 0.
08:59:12.324 | INFO    | Task run 'my_task-0' - Finished in state Completed()
08:59:12.337 | INFO    | Flow run 'ultra-crab' - Created task run 'my_task-1' for task 'my_task'
08:59:12.338 | INFO    | Flow run 'ultra-crab' - Executing 'my_task-1' immediately...
08:59:12.356 | INFO    | Task run 'my_task-1' - PID 92134 triggered with 2 commands running inside the '.' directory.
the number is 7
I have 7 apples
08:59:12.358 | INFO    | Task run 'my_task-1' - PID 92134 completed with return code 0.
08:59:12.368 | INFO    | Task run 'my_task-1' - Finished in state Completed()
08:59:12.377 | INFO    | Flow run 'ultra-crab' - Finished in state Completed()
zanieb commented 1 year ago

I see so you're just interested in dropping the "PID ... stream output:" prefix from the logs?

sckott commented 1 year ago

Yes, just to make the logs less "noisy". Perhaps it can be done via the logging config file?

zanieb commented 1 year ago

It's hard-coded at https://github.com/PrefectHQ/prefect-shell/blob/main/prefect_shell/commands.py#L160

I'd review a pull request making that prefix optional. Perhaps stream_output_include_pid: bool = True. Another option is to get a child logger for each stream output operation that includes the PID e.g.

from logging import getLogger

logger = getLogger("test")
pid = 123
child = logger.getChild(str(pid))
print(child.name)

I think that would be a cleaner solution; this kind of information probably doesn't belong in the message itself?

sckott commented 1 year ago

I don't follow what you mean by the child logger. Is the idea there to make a custom logger which then we can strip the pid line from?

zanieb commented 1 year ago

@sckott if we use a child logger with the PID in the name (in the code I linked) we can drop the prefix entirely as the information would be encoded in the logger name.

sckott commented 1 year ago

Are you proposing this as something a user like me can do in my own code, or a change one could make to prefect-shell?

So whatever change you are suggesting, what would some example logs look like after the change?

zanieb commented 1 year ago

A change in prefect-shell. The logs would just include the output from the command. The pid would be included in the logger name.

sckott commented 1 year ago

So then the PID would not appear in the logs, but would be available via getLogger?

Perhaps this should be toggleable, though I realize that's more complexity to maintain. I think it's likely to be useful to have the PID in the logs for debugging, though I haven't used it that way yet. My worklow thus far has been finding the docker container or process by the name of the scripts that we call via ShellOperation, but it would be faster to know the exact PID

zanieb commented 1 year ago

The PID would appear in local logs, ~but the logger name is not exposed in the UI at this time~ we're adding logger names in the UI in https://github.com/PrefectHQ/prefect-ui-library/pull/1203

znicholasbrown commented 1 year ago

Should go out in the next release @madkinsz @sckott (sooner for Cloud folks)

desertaxle commented 1 year ago

Hey @sckott! I've implemented using a child logger to track PIDs per @madkinsz's suggestion in PrefectHQ/prefect-shell#77. Can you install that branch and run some flows to see if it resolves your issue?

sckott commented 1 year ago

Thanks @desertaxle !

Can you install that branch and run some flows to see if it resolves your issue?

I assume I'll only be able to control what I see in local logs, yes? Just checking.

desertaxle commented 1 year ago

There isn't any extra control, but the process PID has been moved to a child logger. You should be able to still see the PID without it cluttering your log output.

sckott commented 1 year ago

Using the example task/flow file above https://github.com/PrefectHQ/prefect/issues/13079

pip install git+https://github.com/PrefectHQ/prefect-shell.git@pid-child-logger
python3.11 -m test2
File "/opt/homebrew/lib/python3.11/site-packages/prefect_shell/commands.py", line 402, in run
   pid_logger = self.logger.getChild(str(process.pid))

AttributeError: 'PrefectLogAdapter' object has no attribute 'getChild'

Go back to the version of prefect-shell on pypi

pip uninstall prefect-shell
pip install prefect-shell
python3.11 -m test2
10:25:56.785 | INFO    | prefect.engine - Created flow run 'diamond-asp' for flow 'Test'
10:25:56.907 | INFO    | Flow run 'diamond-asp' - Created task run 'my_task-0' for task 'my_task'
10:25:56.908 | INFO    | Flow run 'diamond-asp' - Executing 'my_task-0' immediately...
10:25:56.928 | INFO    | Task run 'my_task-0' - PID 27982 triggered with 2 commands running inside the '.' directory.
10:25:56.932 | INFO    | Task run 'my_task-0' - PID 27982 stream output:
the number is 5
I have 5 apples
10:25:56.933 | INFO    | Task run 'my_task-0' - PID 27982 completed with return code 0.
10:25:56.947 | INFO    | Task run 'my_task-0' - Finished in state Completed()
10:25:56.959 | INFO    | Flow run 'diamond-asp' - Created task run 'my_task-1' for task 'my_task'
10:25:56.959 | INFO    | Flow run 'diamond-asp' - Executing 'my_task-1' immediately...
10:25:56.980 | INFO    | Task run 'my_task-1' - PID 27983 triggered with 2 commands running inside the '.' directory.
10:25:56.981 | INFO    | Task run 'my_task-1' - PID 27983 completed with return code 0.
10:25:56.992 | INFO    | Task run 'my_task-1' - Finished in state Completed()
10:25:57.002 | INFO    | Flow run 'diamond-asp' - Finished in state Completed()

I'm running prefect v2.8.3