CADWRDeltaModeling / dms_datastore

Data download and management tools for continuous data for Pandas. See documentation https://cadwrdeltamodeling.github.io/dms_datastore/
https://cadwrdeltamodeling.github.io/dms_datastore/
MIT License
1 stars 0 forks source link

logging is not multi-processor safe #35

Open water-e opened 6 months ago

water-e commented 6 months ago

python logging is thread safe but not multiprocessor safe. There is also a file contention issue ... I can't remember what two things I tried to run at the same time, but the second launch didn't work because the logging file was busy.

This is not a new issue, just a growing one. The introduction of logging into the older "by agency" multiprocessing introduces the possibility and I'm sure printing messages was even worse. But these things are changing

  1. contention is no longer low
  2. there was now logging in low level functions. For instance,, there is now extensive logging calls to for instance download_ncro where extensive multi-processing is in play while using the same old logger.

As an aside, the use of a single logger possibly locks up how much of the library can be used at once. I know I use download_xxx for other things (e.g. downlod of mission-critical items for SCHISM). This is an incentive to allow more loggers, not a module level logger.

The topic of multi-processor safe writing is covered in a couple places, though I can't vouch for the info: https://docs.python.org/dev/howto/logging-cookbook.html#logging-to-a-single-file-from-multiple-processes

If we split up the logging and don't do logging in multi-processed funcitons we could also create multiple loggers or a pool of them. But the logging plan should probably preceed any more work with multiprocessing. This probably affects reformat as well.

dwr-psandhu commented 6 months ago

I like the idea of logger per agency and also logging different loggers to different files to avoid file contention. I don't want to go down the route of a logging service (which can handle multiple processes) but requires a running service (Azure and other cloud providers do this).

water-e commented 6 months ago

I can create context based logs (populate_cdec.log, reformat_usbr.log). That will take care of the across-agency issue which is the one that was already there.

On the per-row multithreading, there are two ways to go:

  1. Roll it back. I think it is a speedup, but the scaling isn't great. That may be agency specific, and to some extent it seems to be different between you and me. NOAA predictions are my worst offenders and I already took them out. Assuming you feel is compelling and we don't do that ...
  2. Log non-fatal messages to a list and return that list along with fails etc, in which case I might create a dictionary or a class called DownloadResult to encapsulate the various lists (fails, log messages, etc).
  3. Total failure would get caught at the outer level by populate().

I agree solving the issue of logging in a fine grain multiprocessor context sounds awfully complicated.

The other thing is that these downloaders and reformatters have some utility outside of the populate_repo setting. The lighter weight they are the better.


From: Nicky Sandhu @.> Sent: Friday, January 5, 2024 8:13 AM To: CADWRDeltaModeling/dms_datastore @.> Cc: Ateljevich, @. @.>; Author @.***> Subject: Re: [CADWRDeltaModeling/dms_datastore] logging is not multi-processor safe (Issue #35)

I like the idea of logger per agency and also logging different loggers to different files to avoid file contention. I don't want to go down the route of a logging service (which can handle multiple processes) but requires a running service (Azure and other cloud providers do this).

— Reply to this email directly, view it on GitHubhttps://github.com/CADWRDeltaModeling/dms_datastore/issues/35#issuecomment-1878918145, or unsubscribehttps://github.com/notifications/unsubscribe-auth/AG2AJC6YEV2XAWL7J54SWPTYNARC7AVCNFSM6AAAAABBKI5VW6VHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMYTQNZYHEYTQMJUGU. You are receiving this because you authored the thread.Message ID: @.***>