digitalmethodsinitiative / 4cat

The 4CAT Capture and Analysis Toolkit provides modular data capture & analysis for a variety of social media platforms.
Other
246 stars 59 forks source link

Workers start crashing due to a lack of Path flavour #422

Closed stijn-uva closed 6 months ago

stijn-uva commented 6 months ago

Describe the bug Workers sometimes start to crash with the following exception:

ERROR:4cat-backend:Processor image-downloader raised AttributeError while processing dataset ... (via ...) in download_images.py:312->pathlib.py:1076->pathlib.py:699->pathlib.py:692:
   type object 'Path' has no attribute '_flavour'

This crash can occur on any attempt to instantiate a Path object, which is done in many places where files need to be accessed.

To Reproduce The crash seems to start occurring after a FileNotFoundError has been raised in a worker while using paths:

ERROR:4cat-backend:Processor vision-label-network raised FileNotFoundError while processing dataset ... (via ...) in google_vision_network.py:75->dataset.py:304->pathlib.py:1246->pathlib.py:1114:

This can happen if a file a worker is working with is deleted while the worker is running, for example by another worker or by user action. Afterwards, it looks like pathlib gets reloaded, as the following check in Path.__new__() no longer evaluates to True (even in workers started after the one that originally crashed!)

if cls is Path:
            cls = WindowsPath if os.name == 'nt' else PosixPath

cls is Path in practice means id(cls) == id(Path), and id(Path) is different before the raising of FileNotFoundError compared to after it, while id(cls) remains the same. As a result, the instantiated object is not cast to an OS-specific version, and thus it has no _flavour attribute, which trips up later operations that expect one.

Why raising a FileNotFoundError results in a reload of the library (or something like it) is unclear.

Possible solutions

stijn-uva commented 6 months ago

The Slack webhook handler in logger.py attempted to load the file a crash was reported from via importlib's SourceFileLoader. This tried to re-instantiate the module under its same (canonical) name within that worker. The webhook could then load metadata from the module for a more helpful Slack alert (specifically, including the __maintainer__ of the file in the alert).

This made no real difference in most cases, since the module is loaded from the same file, so it would functionally be the same module, et cetera. But because Python shares its module memory between threads, the pathlib module was now canonically in a different location in the memory after the forced (re)loading with importlib, for all workers current and future; however, Path objects instantiated before this happened still retain a reference to the old pathlib instance, which is not garbage collected.

This includes various configuration paths which are instantiated as Path objects in the configuration loader all the way at the beginning when 4CAT is first loaded. These are often part of a newly constructed path because dataset files are made/read relative to these configured paths.

Hence, for these "old" objects, comparing their class (from the old pathlib) to the canonical class (from the new importlib) evaluated to them not being the same. Even if they were both made via Path(string), the Path class had a different id() in both cases, since id() is based on the module's location in memory. Whenever a Path() constructor was called with such an old path object as the argument, this meant a flavour was no longer assigned to the newly instantiated object. So the following type of code made this issue occur:

some_path = Path(old_instantiated_path).joinpath("filename")
# or
some_path = Path("some_path").joinpath(old_instantiated_path)

In other words this bug only occurred:

  1. when for some reason a worker crashed because of a FileNotFoundError or other exception raised in pathlib
  2. a Slack webhook was configured
  3. the exception was passed to the Slack webhook.

Moral of the story: be careful with importlib

dale-wahl commented 5 months ago

Wow. That must have taken some quality detective work! Bravo.