Open gjoseph92 opened 2 years ago
This would also make logs more verbose. Not necessarily enough reason not to do this, but it's worth considering. Mostly I want us to make sure that we want to do this before we decide how we could accomplish it.
On Thu, Jul 14, 2022, 1:30 PM Gabe Joseph @.***> wrote:
When viewing logs in which multiple workers (and/or nannies) are mixed into the same stream (for example, all tests in CI), it's often not possible to tell which worker produced a message.
Including the address of the current server in all log messages would make debugging easier in these cases.
This is probably rather tricky because:
- our loggers are all per-file global variables, not associated with a particular instance
- getting the current instance in an async context is broken: #4959 https://github.com/dask/distributed/issues/4959
One of those two things would probably need to be refactored to make this happen.
— Reply to this email directly, view it on GitHub https://github.com/dask/distributed/issues/6725, or unsubscribe https://github.com/notifications/unsubscribe-auth/AACKZTFSMNIX2RNNBIYN2V3VUBMCPANCNFSM53TE57ZQ . You are receiving this because you are subscribed to this thread.Message ID: @.***>
Instead of
2022-07-14 19:24:27,055 - distributed.worker - INFO - Stopping worker at tcp://127.0.0.1:54646
2022-07-14 19:24:27,065 - distributed.worker - INFO - Connection to scheduler broken. Closing without reporting. ID: Worker-d0c85403-35f8-42b3-82d6-d309a44eff3b Address tcp://127.0.0.1:54646 Status: Status.closing
2022-07-14 19:24:27,077 - distributed.worker - INFO - Stopping worker at tcp://127.0.0.1:54638
2022-07-14 19:24:27,075 - distributed.nanny - INFO - Nanny asking worker to close
2022-07-14 19:24:27,065 - distributed.worker - INFO - Connection to scheduler broken. Closing without reporting. ID: Worker-d0c85403-35f8-42b3-82d6-d309a44eff3b Address tcp://127.0.0.1:54646 Status: Status.closing
2022-07-14 19:24:27,076 - distributed.nanny - INFO - Nanny asking worker to close
2022-07-14 19:24:27,076 - distributed.nanny - INFO - Nanny asking worker to close
you'd have
2022-07-14 19:24:27,055 - distributed.worker tcp://127.0.0.1:54646 - INFO - Stopping worker at tcp://127.0.0.1:54646
2022-07-14 19:24:27,065 - distributed.worker tcp://127.0.0.1:54646 - INFO - Connection to scheduler broken. Closing without reporting. ID: Worker-d0c85403-35f8-42b3-82d6-d309a44eff3b Address tcp://127.0.0.1:54646 Status: Status.closing
2022-07-14 19:24:27,077 - distributed.worker tcp://127.0.0.1:54638 - INFO - Stopping worker at tcp://127.0.0.1:54638
2022-07-14 19:24:27,075 - distributed.nanny tcp://127.0.0.1:54646- INFO - Nanny asking worker to close
2022-07-14 19:24:27,065 - distributed.worker tcp://127.0.0.1:54638 - INFO - Connection to scheduler broken. Closing without reporting. ID: Worker-d0c85403-35f8-42b3-82d6-d309a44eff3b Address tcp://127.0.0.1:54646 Status: Status.closing
2022-07-14 19:24:27,076 - distributed.nanny tcp://127.0.0.1:54646 - INFO - Nanny asking worker to close
2022-07-14 19:24:27,076 - distributed.nanny tcp://127.0.0.1:54638 - INFO - Nanny asking worker to close
which is a pretty minimal increase in verbosity, but much more meaningful when debugging.
If log line length is a concern, I'd suggest removing the prefix distributed
from every worker/nanny/scheduler message.
I don't think that it's minimal. I think that it's meaningful enough to warrant consideration (but not necessarily stop things).
If log line length is a concern, I'd suggest removing the prefix distributed from every worker/nanny/scheduler message.
Yeah, that seems sensible to consider as well I think.
On Thu, Jul 14, 2022 at 3:06 PM Gabe Joseph @.***> wrote:
Instead of
2022-07-14 19:24:27,055 - distributed.worker - INFO - Stopping worker at tcp://127.0.0.1:54646 2022-07-14 19:24:27,065 - distributed.worker - INFO - Connection to scheduler broken. Closing without reporting. ID: Worker-d0c85403-35f8-42b3-82d6-d309a44eff3b Address tcp://127.0.0.1:54646 Status: Status.closing 2022-07-14 19:24:27,077 - distributed.worker - INFO - Stopping worker at tcp://127.0.0.1:54638 2022-07-14 19:24:27,075 - distributed.nanny - INFO - Nanny asking worker to close 2022-07-14 19:24:27,065 - distributed.worker - INFO - Connection to scheduler broken. Closing without reporting. ID: Worker-d0c85403-35f8-42b3-82d6-d309a44eff3b Address tcp://127.0.0.1:54646 Status: Status.closing 2022-07-14 19:24:27,076 - distributed.nanny - INFO - Nanny asking worker to close 2022-07-14 19:24:27,076 - distributed.nanny - INFO - Nanny asking worker to close
you'd have
2022-07-14 19:24:27,055 - distributed.worker tcp://127.0.0.1:54646 - INFO - Stopping worker at tcp://127.0.0.1:54646 2022-07-14 19:24:27,065 - distributed.worker tcp://127.0.0.1:54646 - INFO - Connection to scheduler broken. Closing without reporting. ID: Worker-d0c85403-35f8-42b3-82d6-d309a44eff3b Address tcp://127.0.0.1:54646 Status: Status.closing 2022-07-14 19:24:27,077 - distributed.worker tcp://127.0.0.1:54638 - INFO - Stopping worker at tcp://127.0.0.1:54638 2022-07-14 19:24:27,075 - distributed.nanny tcp://127.0.0.1:54646- INFO - Nanny asking worker to close 2022-07-14 19:24:27,065 - distributed.worker tcp://127.0.0.1:54638 - INFO - Connection to scheduler broken. Closing without reporting. ID: Worker-d0c85403-35f8-42b3-82d6-d309a44eff3b Address tcp://127.0.0.1:54646 Status: Status.closing 2022-07-14 19:24:27,076 - distributed.nanny tcp://127.0.0.1:54646 - INFO - Nanny asking worker to close 2022-07-14 19:24:27,076 - distributed.nanny tcp://127.0.0.1:54638 - INFO - Nanny asking worker to close
which is a pretty minimal increase in verbosity, but much more meaningful when debugging.
If log line length is a concern, I'd suggest removing the prefix distributed from every worker/nanny/scheduler message.
— Reply to this email directly, view it on GitHub https://github.com/dask/distributed/issues/6725#issuecomment-1184849205, or unsubscribe https://github.com/notifications/unsubscribe-auth/AACKZTCHY5L6T7KN6XAPRUTVUBXN7ANCNFSM53TE57ZQ . You are receiving this because you commented.Message ID: @.***>
I also don't need this to be the default when deployed (since typically different machines will aggregate logs separately) (though it's still nice to be able to interleave them sometimes). I just want it in CI.
When viewing logs in which multiple workers (and/or nannies) are mixed into the same stream (for example, all tests in CI), it's often not possible to tell which worker produced a message.
Including the address of the current server in all log messages would make debugging easier in these cases.
This is probably rather tricky because:
One of those two things would probably need to be refactored to make this happen.