Instagram / LibCST

A concrete syntax tree parser and serializer library for Python that preserves many aspects of Python's abstract syntax tree
https://libcst.readthedocs.io/
Other
1.55k stars 191 forks source link

MatchIfTrue lambdas used in decorator matchers in VisitorBasedCodemodCommand hit pickle errors on large codebase #798

Open jmhodges-color opened 2 years ago

jmhodges-color commented 2 years ago

In a subclass of VisitorBasedCodemodCommand, I have code that creates a @matchers.visit with a MatchIfTrue lambda that throws a lambda pickling error when the codemod is run on a sufficiently large codebase. Running it on a file at a time is fine, because the multiprocessing work doesn't kick in.

Probably it deserves a call out in the docs of MatchIfTrue to use named functions instead of lambdas. I'm not sure what other solutions are available.

The command in question:

def func_def_has_request_param_matcher() -> m.FunctionDef:
    return m.FunctionDef(
        params=m.Parameters(
            params=m.MatchIfTrue(
                lambda params: any(p.name.value == "request" for p in params)
            ),
        )
    )

class ReproCommand(VisitorBasedCodemodCommand):
    DESCRIPTION = (
        "Do cool thing."
    )

    @staticmethod
    def add_args(_arg_parser: argparse.ArgumentParser) -> None:
        return

    @m.visit(func_def_has_request_param_matcher())
    def _mark_func_def_includes_request_param(self, node: cst.FunctionDef) -> None:
        self.context.scratch["cool-scratch"] = node
The error in question $ python -m libcst.tool codemod commands.repro.ReproCommand src Calculating full-repo metadata... Executing codemod... Traceback (most recent call last): File "/Users/jeff.hodges/.asdf/installs/python/3.8.13/lib/python3.8/runpy.py", line 194, in _run_module_as_main return _run_code(code, main_globals, None, File "/Users/jeff.hodges/.asdf/installs/python/3.8.13/lib/python3.8/runpy.py", line 87, in _run_code exec(code, run_globals) File "/Users/jeff.hodges/src/github.com/color/color/local/virtualenv3/lib/python3.8/site-packages/libcst/tool.py", line 839, in main(os.environ.get("LIBCST_TOOL_COMMAND_NAME", "libcst.tool"), sys.argv[1:]) File "/Users/jeff.hodges/src/github.com/color/color/local/virtualenv3/lib/python3.8/site-packages/libcst/tool.py", line 834, in main return lookup.get(args.action or None, _invalid_command)(proc_name, command_args) File "/Users/jeff.hodges/src/github.com/color/color/local/virtualenv3/lib/python3.8/site-packages/libcst/tool.py", line 572, in _codemod_impl result = parallel_exec_transform_with_prettyprint( File "/Users/jeff.hodges/src/github.com/color/color/local/virtualenv3/lib/python3.8/site-packages/libcst/codemod/_cli.py", line 638, in parallel_exec_transform_with_prettyprint for result in p.imap_unordered( File "/Users/jeff.hodges/.asdf/installs/python/3.8.13/lib/python3.8/multiprocessing/pool.py", line 448, in return (item for chunk in result for item in chunk) File "/Users/jeff.hodges/.asdf/installs/python/3.8.13/lib/python3.8/multiprocessing/pool.py", line 868, in next raise value File "/Users/jeff.hodges/.asdf/installs/python/3.8.13/lib/python3.8/multiprocessing/pool.py", line 537, in _handle_tasks put(task) File "/Users/jeff.hodges/.asdf/installs/python/3.8.13/lib/python3.8/multiprocessing/connection.py", line 206, in send self._send_bytes(_ForkingPickler.dumps(obj)) File "/Users/jeff.hodges/.asdf/installs/python/3.8.13/lib/python3.8/multiprocessing/reduction.py", line 51, in dumps cls(buf, protocol).dump(obj) AttributeError: Can't pickle local object 'func_def_has_request_param_matcher..'
Inlining the matcher creation gets a slightly different error $ python -m libcst.tool codemod commands.repro.ReproCommand src Calculating full-repo metadata... Executing codemod... Traceback (most recent call last): File "/Users/jeff.hodges/.asdf/installs/python/3.8.13/lib/python3.8/runpy.py", line 194, in _run_module_as_main return _run_code(code, main_globals, None, File "/Users/jeff.hodges/.asdf/installs/python/3.8.13/lib/python3.8/runpy.py", line 87, in _run_code exec(code, run_globals) File "/Users/jeff.hodges/src/github.com/color/color/local/virtualenv3/lib/python3.8/site-packages/libcst/tool.py", line 839, in main(os.environ.get("LIBCST_TOOL_COMMAND_NAME", "libcst.tool"), sys.argv[1:]) File "/Users/jeff.hodges/src/github.com/color/color/local/virtualenv3/lib/python3.8/site-packages/libcst/tool.py", line 834, in main return lookup.get(args.action or None, _invalid_command)(proc_name, command_args) File "/Users/jeff.hodges/src/github.com/color/color/local/virtualenv3/lib/python3.8/site-packages/libcst/tool.py", line 572, in _codemod_impl result = parallel_exec_transform_with_prettyprint( File "/Users/jeff.hodges/src/github.com/color/color/local/virtualenv3/lib/python3.8/site-packages/libcst/codemod/_cli.py", line 638, in parallel_exec_transform_with_prettyprint for result in p.imap_unordered( File "/Users/jeff.hodges/.asdf/installs/python/3.8.13/lib/python3.8/multiprocessing/pool.py", line 448, in return (item for chunk in result for item in chunk) File "/Users/jeff.hodges/.asdf/installs/python/3.8.13/lib/python3.8/multiprocessing/pool.py", line 868, in next raise value File "/Users/jeff.hodges/.asdf/installs/python/3.8.13/lib/python3.8/multiprocessing/pool.py", line 537, in _handle_tasks put(task) File "/Users/jeff.hodges/.asdf/installs/python/3.8.13/lib/python3.8/multiprocessing/connection.py", line 206, in send self._send_bytes(_ForkingPickler.dumps(obj)) File "/Users/jeff.hodges/.asdf/installs/python/3.8.13/lib/python3.8/multiprocessing/reduction.py", line 51, in dumps cls(buf, protocol).dump(obj) _pickle.PicklingError: Can't pickle at 0x10205b1f0>: attribute lookup ReproCommand. on projects.codemods.pycodemods.commands.repro failed

(Tangentially, if there's a better way to do this "match if any Node in this Sequence matches this pattern", I'm down to be told what that is)

zsol commented 2 years ago

Hmm I suppose the entire visitor/codemod needs to be pickleable when running it against multiple files, but the matchers accepting callbacks definitely encourage lambdas, so I'd support a PR to add a callout in their docs. The two problematic matchers are MatchIfTrue and MatchMetadataIfTrue.

(Tangentially, if there's a better way to do this "match if any Node in this Sequence matches this pattern", I'm down to be told what that is)

I think your solution is the cleanest way to express this, but occasionally I find myself writing

m.Parameters(
  params=[m.ZeroOrMore(), m.Param(name=m.Name(value="request")), m.ZeroOrMore()],
)
jmhodges-color commented 2 years ago

Honestly, I much prefer your ZeroOrMore version. The lambdas get more ugly when you want to match more than a simple string, involving calls out to matchers.matches and such.

mx781 commented 10 months ago

The pre+post ZeroOrMore() for matching sequences in that fashion is a real nugget - would be worth highlight this recipe in the docs unless it's already there somewhere and I missed it!

kiri11 commented 3 months ago

MatchIfTrue will work if you create a regular named function and use that instead of a lambda. Something like this:

def request_in_params(params):
    return any(p.name.value == "request" for p in params)

...

m.MatchIfTrue(request_in_params)

Also, you could pass jobs=1 so Pickle isn't used for muliprocessing (but it will be slower).