facebookincubator / fastmod

A fast partial replacement for the codemod tool
Apache License 2.0
1.68k stars 42 forks source link

Suggestion: add (new)line #13

Open jaques-sam opened 5 years ago

jaques-sam commented 5 years ago

A missing functionality (if I'm not mistaken) is the ability to add a line. Good use case would be that I want to add a 'feature'. For most of the code, a new line needs to be added similar to an existing feature.

One way to solve this would be to allow adding (\r)\n characters like

fastmod -m '(^[\s]*)(feature4\n)' '${1}${2}${1}feature5\n'

But \n is just printed raw as \+n

swolchok commented 5 years ago

You can insert a literal newline into a string in most shell simply by pressing enter. That should do the trick, right?

On Thu, Jun 6, 2019 at 9:17 AM Sam Jaques notifications@github.com wrote:

A missing functionality (if I'm not mistaken) is the ability to add a line. Good use case would be that I want to add a 'feature'. For most of the code, a new line needs to be added similar to an existing feature.

One way to solve this would be to allow adding (\r)\n characters like fastmod -m '(^[\s]*)(feature4\n)' '${1}${2}${1}feature5\n' But \n is just printed raw as ''+'n'

— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub https://github.com/facebookincubator/fastmod/issues/13?email_source=notifications&email_token=AAAIK6ADYDEWPCQ6SPELDLLPZE2CZA5CNFSM4HVEN3I2YY3PNVWWK3TUL52HS4DFUVEXG43VMWVGG33NNVSW45C7NFSM4GYCDFMA, or mute the thread https://github.com/notifications/unsubscribe-auth/AAAIK6F5OEPVQVEJQY7MXGLPZE2CZANCNFSM4HVEN3IQ .

jaques-sam commented 5 years ago

Wow, you're right. It works like a charm! Example:

$ fastmod --extensions txt -m '(^[\s]*)(feature4).([h|cpp])\r\n' '${1}${2}.${3}
> ${1}feature5.${3}
> '                                                                                 

Thanks!

exoosh commented 1 year ago

@swolchok would you accept a PR implementing this?

Sure, there are shells allowing this. But not all do. And the reliance on a shell means fastmod isn't a self-contained tool anymore for the stated purpose.

But even if you happen to use a shell that supports this, it makes it incredibly unwieldy ... either you need to stubble together '' and "" strings to dodge the bullet in regard to escaping the $ of the back references, or you use "" strings and escape them, but suddenly can't break your command line anymore to make scripts more legible (because broken \-continued lines cannot be indented anymore when I need to use literal line breaks.

So I thought to myself, heck, if I need to go through hoops, why not capture the line ending in --multiline mode and "reuse" the back reference on that. Alas, this does not work for reasons beyond me.

fastmod \
        '(#define\s+?SomeHeaderGuard_h)([\r\n]+)' "\${1}${2}${2}#include <cstdint>" \
        --accept-all \
        --print-changed-files \
        --glob '*Types.h' \
        --dir directory \
        --multiline

... it matches. It even performs a replacement, but the line breaks are not to be seen anywhere. I think that part may even qualify as a defect.

Besides, when working with regexes I fully expect to be able to use such escaped characters in the replacement string.

swolchok commented 1 year ago

Alas, this does not work for reasons beyond me.

Your escaping in the replacement string is inconsistent -- you've escaped the first $ but not the second or third. Fixing this seems to work for me:

$ echo 'test' > testfile.txt
$ fastmod '(test)([\r\n]+)' '${1}${2}${2}another line${2}' testfile.txt --accept-all --multiline
$ cat testfile.txt
test

another line

there are shells allowing this. But not all do

Can we talk specifics? In what environment are you having trouble? Perhaps I can help. Just in case it helps without needing a round-trip cycle, BSD sed has a similar behavior of not interpreting escape characters in output; here's a Stack Overflow page with some ways to put newlines in: https://stackoverflow.com/questions/46082397/insert-newline-n-using-sed

exoosh commented 1 year ago

@swolchok thanks for the correction! You are right about the missing escapes for $. Something that you even warn explicitly about in the help output. But if fastmod understood a well-defined set of non-printable characters in escaped form within the replacement string, this would not be an issue, right? (I.e. I wouldn't have to be creative by using capturing groups.)

Can we talk specifics? In what environment are you having trouble? Perhaps I can help.

I am using it in different environments and one of them is Windows. Broadly speaking we're talking Linux and Windows. And the "problem" is that not in all cases a shell is even involved. PowerShell could probably even do that with relative ease, given that its escape character is a backtick and not the backslash, but with NT scripts ("batch") I'm out of luck.

Yes, I know, I could probably write a wrapper of sorts in each case, but the issue is that this would effectively mean one wrapper per regex/replacement pair or something generic that understands the fastmod command line switches well enough to perform a replacement on the arguments passed down to fastmod. And with the latter I'm circling back to "why not have it in fastmod?".

Thanks for your time and patience (and also thanks for a very useful tool).

PS: if you are open to a PR implementing this, I'm not yet sure if I'd contribute in my professional capacity (this account) or as individual, but I'd be willing to put in the time and effort to make this happen.

swolchok commented 1 year ago

My big reluctance on this is: 1) it's a breaking change to the tool 2) people already don't quite understand the difference between their shell and their tools' expected inputs, so making the reality extremely simple helps troubleshooting since we can say with confidence that fastmod isn't doing any escaping. I've asked for a second opinion internally though; will reply back after I get some more thoughts.

exoosh commented 1 year ago

Thanks for getting a second opinion. Hope it will be seen as acceptable, even if it meant that a command line switch needs to be used to explicitly enable this behavior going forward, say.

Can understand your point on the inability of users to understand what is done by the shell and what is done by the tool. This frequently comes up whenever tools either refer to using shell magic for stuff they don't do (see above) and when cross-platform tools try to partially mimic that shell behavior in order to provide the illusion of shell globbing on Windows, say (I worked on another Rust-based tool where that was an issue).

swolchok commented 1 year ago

I'm running a bit behind on work due to a power outage yesterday. After consulting with some other folks, the idea I heard that I liked was to detect \n in the replacement string and print some help on inserting newlines. The todos are

assarbad commented 1 year ago

For PowerShell one can use ` as escape character, in a similar fashion to \ in Bash. The $ would also have to be escaped in PowerShell. So `n is equivalent to the common meaning of \n in PowerShell, if using a "" string.

I'm not sure where exactly you'd want that however, @swolchok. Did you mean to place this into the README.md or into the long about string embedded in the help output?

For NT script ("batch") you should technically be able to embed a line break using the following (paraphrasing the example by @jaques-sam):

fastmod --extensions txt -m '(^[\s]*)(feature4).([h|cpp])\r\n' '${1}${2}.${3}^
${1}feature5.${3}

However, things get more difficult when you want to deal with \n vs. \r vs. \r\n ... or with \t, depending on the shell.

With my PR #42 you'd be able to use the same characters you are used to, independently of a shell, provided you enable that handling with -S / --subst-escapes.

exoosh commented 10 months ago

My big reluctance on this is: 1) it's a breaking change to the tool 2) people already don't quite understand the difference between their shell and their tools' expected inputs, so making the reality extremely simple helps troubleshooting since we can say with confidence that fastmod isn't doing any escaping.

@swolchok Regarding your 1), I think that's a moot point, given codemod is no longer developed (since June 2021) and this tool is supposed to be a partial replacement. No?

Regarding 2) it's true that shells differ, but that's exactly my point. It's hard enough as it is to juggle between all sorts of more-or-less-POSIX-y shells (backslash), NT script (circumflex) and Powershell (backtick) regarding escape characters. But the tool does already use its own globbing totally independent of the shell used to run it. So why is it alright to use the Python-internal globbing (codemod) or Rust-internal globbing (fastmod) rather than deferring it to the shell, but then it's not okay to gloss over other portability issues such as those non-printable characters within substitutions? Together with the fact that codemod seems to be no longer developed, this leaves me flabbergasted.

I guess when you're in the comfortable position of using some POSIX-y shell (which I am not 100% of my time), then you don't care if the shell or fastmod itself expands the \n or \r or \t etc. But if you juggle between these, then it makes sense to have one uniform way to express it.

Please advise: does it make more sense to run a friendly fork of fastmod, or are you at all willing to accept outside contributions at all (and what are the constraints in that case, other than signing the CLA). Obviously there are some which are pretty much "between the lines".

swolchok commented 6 months ago

are you at all willing to accept outside contributions at all

yes, we accept outside contributions.

what are the constraints in that case

I believe I've discussed above what the contribution I would be happy to accept would look like. Philosophically, I don't think it should be the responsibility of every command-line program to re-implement features that should be provided by the user's shell.

does already use its own globbing

the tool does recursive directory traversal, which is beyond simple globbing.

swolchok commented 6 months ago

also, #42 looks OK except for the review comments I left, but hasn't been updated since.

assarbad commented 6 months ago

the tool does recursive directory traversal, which is beyond simple globbing.

Wait, I think you misunderstood that part. Yep, I get the point about the traversal, but the --glob and --iglob options already provide a functionally similar mechanism to shell globbing even on Windows for each file of the traversed directories. And to get that you have to actually prevent the (Unix) shell from expanding those globs prior to fastmod seeing them. That's what I was referring to above (and which was absent in codemod, no? ... i.e. --iglob and --glob).

also, #42 looks OK except for the review comments I left, but hasn't been updated since.

@swolchok Sorry, must have totally missed that. My apologies. But I am also still unable to see any review comments 😬. Did you actually post them or is the review still in-progress and therefore only visible to you? Here's what I see on #42:

image

Feel free to skip and ignore the points below. I just wanted to convey that I totally get your point. There are a lot of things that are odd on either platform, in my opinion. But since the tool is otherwise cross-platform, its users would benefit from the ability to tap into that functionality despite their shell.


I believe I've discussed above what the contribution I would be happy to accept would look like. Philosophically, I don't think it should be the responsibility of every command-line program to re-implement features that should be provided by the user's shell.

Got that point, and generally I even agree with your opinion. But it's kind of a moot point for a shell (cmd.exe is the successor of command.com) that has been around much longer than several shells in the Unix realm, including Bash (even cmd.exe predates Bash), and has its own conventions and places considerable emphasis on backwards compatibility. So it's a case of "can't teach an old dog new tricks", I reckon (hope this isn't totally out of place as a metaphor here; I'm not a native speaker of English).

This is one of the bigger differences between how Unix shells do things and cmd.exe and its predecessor command.com as well as PowerShell do it. I have found that even long-term Unix users are oblivious of the fact that it is the shell that does the globbing prior to some command being executed by them, whereas on Windows each individual program does the handling. It even catches up to Unix users at times when they start attempting to pass globs not via system() but execv() or similar, because then without the intermediate sh -c they find themselves at odds about their expectations regarding globs (and with sh -c you'll often ask for trouble security-wise unless whatever gets passed is thoroughly vetted before it's passed to the shell).

MSVC offers a so-called link option (not to be confused with linker command line options) which does the command-line argument wildcard expansion (again according to the ancient rules inherited from DOS with only minor adjustments; i.e. also different from shell globbing) before your main() gets to see those in argv:

setargv.obj Enables command-line argument wildcard expansion. See Expanding wildcard arguments.

What's worse, though, on Linux you can pass usually 2 MiB (ARG_MAX) worth of command line arguments and then xargs or find -exec {} + exist to overcome that limit. On Windows the overall size of a command line is vastly more limited by cmd.exe (docs). So, even if cmd.exe behaved more like a POSIX shell in terms of globbing, you'd soon run into the next limitation with that. Contrary to popular believe, however, the situation with the length of file paths is the inverse (~32k characters on Windows, usually ~4k on Linux ... getconf -a|grep PATH_MAX) and I have many times cursed at my Linux installations, especially when using ecryptfs which causes unduly long path names to be created. But I'll readily admit that Windows makes using these longer paths unnecessarily difficult 😬

So my point -- and mind you, my main systems at home are all Linux-based (only at work Windows is the main system), but I still care somewhat about my Windows tooling -- was that by saying "the shell should" you're about 44 years late if we look at command.com from which this behavior originates or 37 years late if we take cmd.exe and it would be nice to be able to get that into fastmod builds for Windows.

I guess the misunderstanding on my end was that it looked as if you weren't interested in merging the changes and I can't see those review comments you mention -- even at the time of writing.