Closed stijn-uva closed 6 months ago
The Slack webhook handler in logger.py attempted to load the file a crash was reported from via importlib's SourceFileLoader. This tried to re-instantiate the module under its same (canonical) name within that worker. The webhook could then load metadata from the module for a more helpful Slack alert (specifically, including the __maintainer__
of the file in the alert).
This made no real difference in most cases, since the module is loaded from the same file, so it would functionally be the same module, et cetera. But because Python shares its module memory between threads, the pathlib
module was now canonically in a different location in the memory after the forced (re)loading with importlib, for all workers current and future; however, Path objects instantiated before this happened still retain a reference to the old pathlib instance, which is not garbage collected.
This includes various configuration paths which are instantiated as Path
objects in the configuration loader all the way at the beginning when 4CAT is first loaded. These are often part of a newly constructed path because dataset files are made/read relative to these configured paths.
Hence, for these "old" objects, comparing their class (from the old pathlib) to the canonical class (from the new importlib) evaluated to them not being the same. Even if they were both made via Path(string)
, the Path
class had a different id()
in both cases, since id()
is based on the module's location in memory. Whenever a Path()
constructor was called with such an old path object as the argument, this meant a flavour was no longer assigned to the newly instantiated object. So the following type of code made this issue occur:
some_path = Path(old_instantiated_path).joinpath("filename")
# or
some_path = Path("some_path").joinpath(old_instantiated_path)
In other words this bug only occurred:
FileNotFoundError
or other exception raised in pathlib Moral of the story: be careful with importlib
Wow. That must have taken some quality detective work! Bravo.
Describe the bug Workers sometimes start to crash with the following exception:
This crash can occur on any attempt to instantiate a
Path
object, which is done in many places where files need to be accessed.To Reproduce The crash seems to start occurring after a
FileNotFoundError
has been raised in a worker while using paths:This can happen if a file a worker is working with is deleted while the worker is running, for example by another worker or by user action. Afterwards, it looks like
pathlib
gets reloaded, as the following check inPath.__new__()
no longer evaluates toTrue
(even in workers started after the one that originally crashed!)cls is Path
in practice meansid(cls) == id(Path)
, andid(Path)
is different before the raising ofFileNotFoundError
compared to after it, whileid(cls)
remains the same. As a result, the instantiated object is not cast to an OS-specific version, and thus it has no_flavour
attribute, which trips up later operations that expect one.Why raising a
FileNotFoundError
results in a reload of the library (or something like it) is unclear.Possible solutions
oslo.concurrency
)Path
subclass that ensures that the object always has a flavourid(Path)
have a different result after theFileNotFoundError
and prevent it from happening