facebookincubator / Bowler

Safe code refactoring for modern Python.
https://pybowler.io/
MIT License
1.57k stars 92 forks source link

Large refactors cause patch rejects, exit code 0 #19

Closed craigds closed 6 years ago

craigds commented 6 years ago
  1. Running a large refactor can cause patch to produce reject files.
  2. Bowler doesn't throw an exception when this is the case, which makes it difficult to notice.

Apologies for the size of this issue; I've reduced it as much as I can while still triggering the problem.

This file: https://gist.github.com/craigds/4bb86df3b3ead343c22be6c7c9c14a57 I've snipped it at 1750 lines, because that's just enough to reproduce the issue.

With this script:

from fissix.fixer_util import Newline
from fissix.pygram import python_symbols as syms

from bowler import Query, TOKEN
from bowler.types import Leaf, Node

def Assert(test, message=None, **kwargs):
    """
    Build an assertion statement
    """
    if not isinstance(test, list):
        test = [test]
    test[0].prefix = " "

    return Node(syms.assert_stmt, [Leaf(TOKEN.NAME, "assert")] + test, **kwargs)

def gdaltest_fail_reason_to_assert(node, capture, filename):
    """
    Converts an entire if statement into an assertion.

    if x == y:
        gdal.post_reason('foo')
        return 'fail'

    -->
        assert x != y, 'foo'
    """
    condition = capture["condition"]
    reason = capture["reason"]
    reason = reason.clone()

    assertion = Assert([condition.clone()], reason, prefix=node.prefix)

    # Trailing whitespace and any comments after the if statement are captured
    # in the prefix for the dedent node. Copy it to the following node.
    dedent = capture["dedent"]
    next_node = node.next_sibling
    node.replace([assertion, Newline()])
    next_node.prefix = dedent.prefix

(
    Query("half_of_ogr_fgdb.py")
    .select(
        """
        if_stmt<
            "if" condition=any ":"
            suite<
                any any
                simple_stmt<
                    power<
                        "gdaltest" trailer< "." "post_reason" >
                        trailer< "(" reason=any ")" >
                    >
                    any
                >
                simple_stmt<
                    return_stmt< "return" returntype=STRING >
                    any
                >
                dedent=any
            >
        >
    """
    )
    .modify(callback=gdaltest_fail_reason_to_assert)
    .execute(interactive=False, write=True)
)

output

failed to apply patch hunk
Traceback (most recent call last):
  File "/Users/cdestigter/checkout/decrapify/lib/python3.7/site-packages/bowler/tool.py", line 287, in process_hunks
    sh.patch("-u", filename, _in=patch)  # type: ignore
  File "/Users/cdestigter/checkout/decrapify/lib/python3.7/site-packages/sh.py", line 1427, in __call__
    return RunningCommand(cmd, call_args, stdin, stdout, stderr)
  File "/Users/cdestigter/checkout/decrapify/lib/python3.7/site-packages/sh.py", line 774, in __init__
    self.wait()
  File "/Users/cdestigter/checkout/decrapify/lib/python3.7/site-packages/sh.py", line 792, in wait
    self.handle_command_exit_code(exit_code)
  File "/Users/cdestigter/checkout/decrapify/lib/python3.7/site-packages/sh.py", line 815, in handle_command_exit_code
    raise exc
sh.ErrorReturnCode_1:

  RAN: /usr/bin/patch -u half_of_ogr_fgdb.py

  STDOUT:
patching file half_of_ogr_fgdb.py
Hunk #1 FAILED at 1539.
1 out of 1 hunk FAILED -- saving rejects to file half_of_ogr_fgdb.py.rej

  STDERR:

causes?

Chopping much more out of the source file makes the error not occur. It doesn't seem to matter which parts you chop out.

My hunch is that if the diff gets too large, the patch can no longer match line numbers from old to new, so it rejects the patch.

A workaround might be to pause running fixers when the diff gets bigger than a certain threshold and write the new file out, and then continue running fixers? ie patch in multiple stages.

amyreese commented 6 years ago

Currently bowler applies each hunk separately with the patch command, which is probably triggering your issue as the line numbers get further and further offset with each hunk applied. We should probably make Bowler instead track all of the accepted hunks for a given file, and apply them all at once. This will prevent the line numbers from getting out of sync with the state of the file as each patch gets applied, and also have the side benefit of reducing the number of times we shell out to the patch command and touch files.

For anyone interested in working on this, the relevant code is here: https://github.com/facebookincubator/Bowler/blob/master/bowler/tool.py#L241

It should just be a matter of joining the hunks into a single patch, and then running the patch command at the end.

craigds commented 6 years ago

My current workaround for this issue is to just run my bowler script multiple times if there are .rej files present.

amyreese commented 6 years ago

This should be resolved by Lisa's PR.