databio / pypiper

Python toolkit for building restartable pipelines
http://pypiper.databio.org
BSD 2-Clause "Simplified" License
45 stars 9 forks source link

Pipeline log file could collide with filepath given to logmuse #212

Open vreuter opened 6 months ago

vreuter commented 6 months ago

From consideration of #210 , it becomes apparent that a user could specify a path to a logfile for a pipeline manager's logger to write to that would collide with the path the manager will generate for its own log file. This would almost surely be inadvertent and would lead to unexpected behavior like log file overwrites, duplicate messages, and possibly a race condition on file writes.

To protect against this, I think that in any pipeline manager constructor, we should...

  1. check that there's not this collision
  2. if there is a collision, it should be an exception

It should be an exception rather than a warning because... a) It's truly exceptional b) It's almost surely unintended c) It would lead to undesirable behavior d) it would be at the start of almost any pipeline, so raising an exception wouldn't risk wasting a lot of compute progress e) it's unrecoverable -- there's not really a suitable automatic adjustment to make; the hypothetical filepaths in question (logmuse logfile and pypiper log file) may be expected to be present with particular values, e.g. for consumption by other tools, and so doing something like injecting some distinguishing identifier into the filename suffix doesn't seem suitable

@nsheff would you accept (in principle, code review TBD ;) ) a pull request to this effect?

vreuter commented 6 months ago

Where logmuse opens the file in (over)write mode: https://github.com/databio/logmuse/blob/00f8e9f5fd769f7b10fc51dd0ee514f851707eb5/logmuse/est.py#L260-L261

Where a pipeline manager would pass a logfile to logmuse: https://github.com/databio/pypiper/blob/master/pypiper/manager.py#L213-L232 (currently only in the if not args branch of code, but should be in both -- see #211 )

Where a manager sets its own log file path: https://github.com/databio/pypiper/blob/master/pypiper/manager.py#L281

Where a manager opens its own log file path for appending: https://github.com/databio/pypiper/blob/3a8465a900f15a023afc2dd0b73e30cbf2e7eb24/pypiper/manager.py#L527-L531

nsheff commented 6 months ago

yes, I agree with everything you state. I think basically nobody uses the logmuse to set the logfile path, so it hasn't been a problem. But you're right it could be.