Traceback (most recent call last):
File "libcst/codemod/_cli.py", line 285, in _execute_transform
output_tree = transformer.transform_module(input_tree)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "libcst/codemod/_command.py", line 71, in transform_module
tree = super().transform_module(tree)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "libcst/codemod/_codemod.py", line 108, in transform_module
return self.transform_module_impl(tree_with_metadata)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "libcst/codemod/_visitor.py", line 32, in transform_module_impl
return tree.visit(self)
^^^^^^^^^^^^^^^^
File "libcst/_nodes/module.py", line 89, in visit
result = super(Module, self).visit(visitor)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "libcst/_nodes/base.py", line 227, in visit
_CSTNodeSelfT, self._visit_and_replace_children(visitor)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "libcst/_nodes/statement.py", line 1814, in _visit_and_replace_children
body=visit_required(self, "body", self.body, visitor),
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "libcst/_nodes/internal.py", line 81, in visit_required
result = node.visit(visitor)
^^^^^^^^^^^^^^^^^^^
File "libcst/_nodes/base.py", line 236, in visit
leave_result = visitor.on_leave(self, with_updated_children)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "libcst/matchers/_visitors.py", line 515, in on_leave
if _should_allow_visit(
^^^^^^^^^^^^^^^^^^^^
File "libcst/matchers/_visitors.py", line 425, in _should_allow_visit
return _all_positive_matchers_true(
^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "libcst/matchers/_visitors.py", line 404, in _all_positive_matchers_true
if all_matchers[matcher] is None:
~~~~~~~~~~~~^^^^^^^^^
KeyError: FunctionDef
Root cause
The matchers are dataclasses with __hash__ method returning their id. Their id changes in every process, so we can't compare their instances created in different processes.
Even though the same matcher was added to all_matchers, this was when transformer/visitor initialized in the main process. The check causing exception happens in the child process, which has its own copy of all matchers with another set of ids.
This might work on Linux because it uses fork() for multiprocessing by default.
But Windows and MacOS use spawn() with a fresh Python interpreter in each child process.
And it will break even on Linux in Python 3.14+ when Python changes the default way to start child processes in Linux.
The issue does not happen with leave/visit matcher decorators, because they are only used with matches, never compared directly.
Potential solutions
_cli.py can re-initialize _matchers of the MatcherDecoratable transformer/visitor in each child process. Most straightforward way and easy to implement. MatcherDecoratable class can have a function just for that. But every other caller must know to do that in the multiprocessing context.
Make MatcherDecoratableTransformer/MatcherDecoratableVisitor multiprocessing-safe, but how? Maybe delayed initialization of _matchers? But it could still be accessed before the spawn. Or maybe it could save the pid with which matchers were initialized and re-initialize them if pid changes? Seems complicated.
Another way that I don't see? Open for discussion.
I decided to go with the second approach (delayed initialization). Matchers are populated right before we need them, so they will end up in the same process.
This issue has been reported before (https://github.com/Instagram/LibCST/issues/848, https://github.com/Instagram/LibCST/issues/629, https://github.com/Instagram/LibCST/issues/435), but I debugged it and found the root cause, so decided to create another one with more information.
Example traceback:
Root cause
The matchers are dataclasses with
__hash__
method returning theirid
. Theirid
changes in every process, so we can't compare their instances created in different processes.Even though the same matcher was added to
all_matchers
, this was when transformer/visitor initialized in the main process. The check causing exception happens in the child process, which has its own copy of all matchers with another set of ids.This might work on Linux because it uses
fork()
for multiprocessing by default. But Windows and MacOS usespawn()
with a fresh Python interpreter in each child process. And it will break even on Linux in Python 3.14+ when Python changes the default way to start child processes in Linux.The issue does not happen with
leave
/visit
matcher decorators, because they are only used withmatches
, never compared directly.Potential solutions
_cli.py
can re-initialize_matchers
of theMatcherDecoratable
transformer/visitor in each child process. Most straightforward way and easy to implement. MatcherDecoratable class can have a function just for that. But every other caller must know to do that in the multiprocessing context.Make
MatcherDecoratableTransformer
/MatcherDecoratableVisitor
multiprocessing-safe, but how? Maybe delayed initialization of_matchers
? But it could still be accessed before the spawn. Or maybe it could save the pid with which matchers were initialized and re-initialize them if pid changes? Seems complicated.Another way that I don't see? Open for discussion.