JuliaLang / Distributed.jl

Create and control multiple Julia processes remotely for distributed computing. Ships as a Julia stdlib.
https://docs.julialang.org/en/v1/stdlib/Distributed/
MIT License
23 stars 9 forks source link

Need better control over worker logging #51

Open quinnj opened 6 years ago

quinnj commented 6 years ago

Currently, even if you define your own AbstractLogger and set Base.global_logger(mylogger), worker loggers still prepend " From worker X: to all logs.

My current work-around is doing

using Distributed
function Distributed.redirect_worker_output(ident, stream)
           @schedule while !eof(stream)
               println(readline(stream))
           end
       end
using MyPkg
MyPkg.run()

So hurray for JuliaLang/julia#265 and all, but we really need better controls here.

newptcai commented 5 years ago

Yeah, this would be very convenient!

newptcai commented 5 years ago

Also the workaround does not seem to work for me initially.

I found you need to call the

using Distributed
function Distributed.redirect_worker_output(ident, stream)
           @schedule while !eof(stream)
               println(readline(stream))
           end
       end

before calling addprocs

Also, if julia is started with -p option, then this also does not work.

c42f commented 5 years ago

I think the ideal fix here would need both Distributed workers and master to be aware of the standard logging framework. I have something like the following in mind:

For this to be reasonably scalable some element of log filtering will be required on the worker nodes and it might also be necessary to designate a worker for log aggregation and sinking rather than using the master.

simonbyrne commented 3 years ago

Note that you don't see this unless you manually set the logger due to JuliaLang/julia#26798 (that doesn't fix the issue though).

vchuravy commented 3 years ago

One thing I have done recently for distributed logging:

@everywhere begin
  import Dates
  using Logging, LoggingExtras
  const date_format = "HH:MM:SS"

  function dagger_logger(logger)
    logger = MinLevelLogger(logger, Logging.Info)
    logger = TransformerLogger(logger) do log
      merge(log, (; message = "$(Dates.format(Dates.now(), date_format)) ($(myid())) $(log.message)"))
    end
    return logger
  end

  # set the global logger
  if !(stderr isa IOStream)
    ConsoleLogger(stderr)
  else
    FileLogger(stderr, always_flush=true)
  end |> dagger_logger |> global_logger
end

but I agree that @c42f ideas are probably worth exploring.

c42f commented 3 years ago

Yeah I still think it would be nice to have logging "just work" with Distributed by default, in a similar way to the stdout handling.

However it's also clear that redirecting logging to the master node will fall over if there's many nodes or high log volume. So for serious HPC work some distributed solution also seems necessary... such as dumping to a distributed filesystem, if you have one. @kpamnany wrote an interesting comment on that at https://github.com/CliMA/ClimateMachine.jl/issues/134#issuecomment-552491185