digitalearthpacific / dep-tools

Processing tools for Digital Earth Pacific
MIT License
1 stars 0 forks source link

Log to stdout and add more informative progress logs #9

Open alexgleith opened 12 months ago

alexgleith commented 12 months ago

See: https://github.com/digitalearthpacific/dep-mangroves/issues/3

But basically, running the data processes now provides no feedback at where the tasks are at.

An example of this being helpful was the tiles that were loading a ring around the world. If I could see in the logs that it was stuck on finding STAC Items, then it would have been a clue.

Logs need to just push out to standard out, so that they are picked up in the normal Docker tools that we have running.

jessjaco commented 12 months ago

A few things here (sorry I missed the prior issue): All the logging is done in https://github.com/digitalearthpacific/dep-tools/blob/main/dep_tools/runner.py. You can see what is logged there is somewhat terse, but one of the goals was to have machine readable logging, so there is just one row for each "task". (I also recognize the iterative running is not being utilized in the argo workflow, but that is a separate issue).

I think what I'd like to do is enable more explicit logging by calling self.logger.info in the runner with whatever info is desired. Then the logger could choose whether to log it or not.

Then we could either 1) replace the existing logger with e.g. a stdout logger that would log at the info level or 2) combine the existing logger with a stdout logger, and log to both. Ideally the existing logger would only log at the debug level and above, but info would be sent to stdout too.

I prefer the second method, as removing the existing logger would make some of the existing filtering code (to prevent redos) unusable. I actually had started working on this solution in Suva (see https://github.com/jessjaco/azure-logger/blob/b23c9c71b15d9d714a7bde75f66d18fd246b2650/azure_logger/__init__.py#L75), but didn't get it working. For it to work though, we just need to make sure that log level can set separately for each handler (or a workaround).

alexgleith commented 12 months ago

I'm a bit cautious about the external state. I don't want you to change it, but it is a complication.

The way I've managed whether or not work is needed is to have a success flag file, and I like using the STAC document as that. Then each task starts up, checks whether the STAC exists, and skips the tasks, unless the flag for overwrite is set, in which case it runs anyway... Since the STAC is small and it's the last thing written, it's the perfect success state file.

For managing what tasks need to be run, we used a queue (AWS' SQS or RabbitMQ) and put tasks on there... so containers will have an iterator in them and pull tasks off the queue. Anyhow, as I said, I don't want you to change the central logging thing. But we must have logs in Argo, and in a local dev environment, and I really don't mind if they're fairly verbose.

jessjaco commented 12 months ago

I'm not wedded to the existing logger and I can see how testing the stac file is a little cleaner. I do like how the existing setup gives a central place to view all the outputs for a year. It also logs errors (well most errors) so I can just look at the file to see which outputs are missing and why. I am also concerned about the stdout logs not being archived anywhere, and the need to e.g. copy and paste the error into an issue.

For now I'll look into adding the stdout logger to the existing one like I laid out above and also using the stac files as flags, at least optionally, to explore the other pathway to monitor tasks.

alexgleith commented 12 months ago

I am also concerned about the stdout logs not being archived anywhere

I'm still working on our logging solution. We do have Loki up and running, but if I look at Argo, there are no logs from the workflows (which I think is because the workflows aren't logging at all!)

Permanently storing those logs is something I can work on too, but requires a bit more infrastructure on the Azure side. We'll get there.

The Loki logs stick around for weeks currently. The UI for Argo Workflows gets logs from the container, so when the container goes the logs do... but that's separate to Loki.

alexgleith commented 9 months ago

I'm going to work on this eventually, so self-assigned.