berkerpeksag / astor

Python AST read/write
https://pypi.org/project/astor/
BSD 3-Clause "New" or "Revised" License
810 stars 102 forks source link

Added support for a Comment node #51

Closed inklesspen closed 7 years ago

inklesspen commented 8 years ago

Resolves #50

This new node makes it possible to generate comments consisting of text, as well as comments consisting of AST nodes (which render to commented-out Python source).

So far I have not tested this with multi-line nodes such as function definitions. I'm also not sure it will correctly interact with the add_line_information flag. I intend to add more tests in this PR for these things.

The capture_result method I've introduced might work better by creating a new SourceGenerator instance and writing into that, then returning the result of it. Please let me know your thoughts on that.

pmaupin commented 8 years ago

Looks like a great start!

I'm not sure that a new SourceGenerator instance would buy you anything when you get to multi-line comments, though. In fact, I'd probably consider ditching the context manager and just recording the current length of the result list at the start of the comment method, and then post-processing the sublist that starts after that (deleting it off the end of the result list if that makes it easier) to fix the indentation and add the comment character to all the lines in the sublist.

Unfortunately, the pretty printer in astor.source_repr will wreak havoc with your nicely commented lines if they are too long. For example, I suspect that if you make the expression in your current test case long enough, it will fail miserably.

You could consider modifying the pretty printer, but that's really ugly code, and I dare the author to correct me :-). What I would do instead is this: when you are post-processing the comment lines to fix the indentation and add comment characters, before you do any of that you can run the result sublist representing the commented code or text through astor.source_repr.split_lines with a modified maxline parameter. Then you can strip the indentation and add the comment character to the start of every line. The code would look something like this:

 # add split_lines and flatten to the import at the top
from .source_repr import pretty_source, split_lines, flatten

def visit_Comment(self, node):
    prefix = '# '
    end_result = len(self.result)
    # Allowable line length for wrapping.  Allow extra for the indentation we 
    # will strip, but reduce based on the prefix we will add.
    max_line = 79 + len(self.indent_with) * self.indentation - len(prefix)
    --- add your text or visit your nodes here ---
    comment = source_repr.split_lines(result[end_result:], max_line)
    --- loop through comment, stripping indentation and adding prefix ---
    result[end_result:] = flatten(comment)

If that is clear as mud, let me know and I'll try to unpack it a bit when it's not bedtime.

Also, the comment class looks lonely. Should we just add it into the node_utils.py file?

Thanks, Pat

pmaupin commented 8 years ago

Two more things I just thought of:

1) there may be edge cases where the pretty printer cannot properly shorten a line, but then "thinks" it can do so after you insert the '# ' in front of it. It may be possible to inform the pretty printer that you are done with the lines by joining up the lines. The code to do that would replace this line that I wrote above:

result[end_result:] = flatten(comment)

with something like this:

result[end_result:] = re.split(r'(\n)', ''.join(flatten(comment)))

2) At the beginning of the function, before recording end_result, you'll want to flush any preceding newlines into the result buffer. The cheesiest (and therefore recommended as of now :-) possible way to do this is used in visit_Str:

    # Cheesy way to force a flush
    self.write('foo')
    self.result.pop()
inklesspen commented 8 years ago

I ran into a weird problem with my new sample; it seems Python 3.5 parses is not None into something using the NameConstant node, and writing the value directly into the result was causing problems with the output, because you can't call .startswith() on None. (to_source didn't run into this problem because it stringifies each thing before passing it to pretty_source, so I just started doing that too.)

pmaupin commented 8 years ago

Yeah, pretty_source was designed to accept strings, because otherwise, how does it know how long the stuff it has to pretty print is?

Unfortunately, I don't have too much time to look at this ATM; to help me allocate my time, can you speculate how finished you think this is at this point?

Thanks, Pat

berkerpeksag commented 8 years ago

Hi @inklesspen, this looks pretty good so far. Do you have time to work on this? I'm planning to do a release at Python US sprints next month.

Thanks! :)

inklesspen commented 8 years ago

Sorry, I'm no longer working for the company that was wanting this, and I don't have any time to do it on my own time. Feel free to adopt this if you find an interested person.

pmaupin commented 7 years ago

Closing this until it has a champion.