Open jaques-sam opened 5 years ago
You can insert a literal newline into a string in most shell simply by pressing enter. That should do the trick, right?
On Thu, Jun 6, 2019 at 9:17 AM Sam Jaques notifications@github.com wrote:
A missing functionality (if I'm not mistaken) is the ability to add a line. Good use case would be that I want to add a 'feature'. For most of the code, a new line needs to be added similar to an existing feature.
One way to solve this would be to allow adding (\r)\n characters like fastmod -m '(^[\s]*)(feature4\n)' '${1}${2}${1}feature5\n' But \n is just printed raw as ''+'n'
— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub https://github.com/facebookincubator/fastmod/issues/13?email_source=notifications&email_token=AAAIK6ADYDEWPCQ6SPELDLLPZE2CZA5CNFSM4HVEN3I2YY3PNVWWK3TUL52HS4DFUVEXG43VMWVGG33NNVSW45C7NFSM4GYCDFMA, or mute the thread https://github.com/notifications/unsubscribe-auth/AAAIK6F5OEPVQVEJQY7MXGLPZE2CZANCNFSM4HVEN3IQ .
Wow, you're right. It works like a charm! Example:
$ fastmod --extensions txt -m '(^[\s]*)(feature4).([h|cpp])\r\n' '${1}${2}.${3}
> ${1}feature5.${3}
> '
Thanks!
@swolchok would you accept a PR implementing this?
Sure, there are shells allowing this. But not all do. And the reliance on a shell means fastmod
isn't a self-contained tool anymore for the stated purpose.
But even if you happen to use a shell that supports this, it makes it incredibly unwieldy ... either you need to stubble together ''
and ""
strings to dodge the bullet in regard to escaping the $
of the back references, or you use ""
strings and escape them, but suddenly can't break your command line anymore to make scripts more legible (because broken \
-continued lines cannot be indented anymore when I need to use literal line breaks.
So I thought to myself, heck, if I need to go through hoops, why not capture the line ending in --multiline
mode and "reuse" the back reference on that. Alas, this does not work for reasons beyond me.
fastmod \
'(#define\s+?SomeHeaderGuard_h)([\r\n]+)' "\${1}${2}${2}#include <cstdint>" \
--accept-all \
--print-changed-files \
--glob '*Types.h' \
--dir directory \
--multiline
... it matches. It even performs a replacement, but the line breaks are not to be seen anywhere. I think that part may even qualify as a defect.
Besides, when working with regexes I fully expect to be able to use such escaped characters in the replacement string.
Alas, this does not work for reasons beyond me.
Your escaping in the replacement string is inconsistent -- you've escaped the first $
but not the second or third. Fixing this seems to work for me:
$ echo 'test' > testfile.txt
$ fastmod '(test)([\r\n]+)' '${1}${2}${2}another line${2}' testfile.txt --accept-all --multiline
$ cat testfile.txt
test
another line
there are shells allowing this. But not all do
Can we talk specifics? In what environment are you having trouble? Perhaps I can help. Just in case it helps without needing a round-trip cycle, BSD sed has a similar behavior of not interpreting escape characters in output; here's a Stack Overflow page with some ways to put newlines in: https://stackoverflow.com/questions/46082397/insert-newline-n-using-sed
@swolchok thanks for the correction! You are right about the missing escapes for $
. Something that you even warn explicitly about in the help output. But if fastmod
understood a well-defined set of non-printable characters in escaped form within the replacement string, this would not be an issue, right? (I.e. I wouldn't have to be creative by using capturing groups.)
Can we talk specifics? In what environment are you having trouble? Perhaps I can help.
I am using it in different environments and one of them is Windows. Broadly speaking we're talking Linux and Windows. And the "problem" is that not in all cases a shell is even involved. PowerShell could probably even do that with relative ease, given that its escape character is a backtick and not the backslash, but with NT scripts ("batch") I'm out of luck.
Yes, I know, I could probably write a wrapper of sorts in each case, but the issue is that this would effectively mean one wrapper per regex/replacement pair or something generic that understands the fastmod
command line switches well enough to perform a replacement on the arguments passed down to fastmod
. And with the latter I'm circling back to "why not have it in fastmod?".
Thanks for your time and patience (and also thanks for a very useful tool).
PS: if you are open to a PR implementing this, I'm not yet sure if I'd contribute in my professional capacity (this account) or as individual, but I'd be willing to put in the time and effort to make this happen.
My big reluctance on this is: 1) it's a breaking change to the tool 2) people already don't quite understand the difference between their shell and their tools' expected inputs, so making the reality extremely simple helps troubleshooting since we can say with confidence that fastmod isn't doing any escaping. I've asked for a second opinion internally though; will reply back after I get some more thoughts.
Thanks for getting a second opinion. Hope it will be seen as acceptable, even if it meant that a command line switch needs to be used to explicitly enable this behavior going forward, say.
Can understand your point on the inability of users to understand what is done by the shell and what is done by the tool. This frequently comes up whenever tools either refer to using shell magic for stuff they don't do (see above) and when cross-platform tools try to partially mimic that shell behavior in order to provide the illusion of shell globbing on Windows, say (I worked on another Rust-based tool where that was an issue).
I'm running a bit behind on work due to a power outage yesterday. After consulting with some other folks, the idea I heard that I liked was to detect \n in the replacement string and print some help on inserting newlines. The todos are
For PowerShell one can use `
as escape character, in a similar fashion to \
in Bash. The $
would also have to be escaped in PowerShell. So `n
is equivalent to the common meaning of \n
in PowerShell, if using a ""
string.
I'm not sure where exactly you'd want that however, @swolchok. Did you mean to place this into the README.md
or into the long about string embedded in the help output?
For NT script ("batch") you should technically be able to embed a line break using the following (paraphrasing the example by @jaques-sam):
fastmod --extensions txt -m '(^[\s]*)(feature4).([h|cpp])\r\n' '${1}${2}.${3}^
${1}feature5.${3}
However, things get more difficult when you want to deal with \n
vs. \r
vs. \r\n
... or with \t
, depending on the shell.
With my PR #42 you'd be able to use the same characters you are used to, independently of a shell, provided you enable that handling with -S
/ --subst-escapes
.
My big reluctance on this is: 1) it's a breaking change to the tool 2) people already don't quite understand the difference between their shell and their tools' expected inputs, so making the reality extremely simple helps troubleshooting since we can say with confidence that fastmod isn't doing any escaping.
@swolchok Regarding your 1), I think that's a moot point, given codemod is no longer developed (since June 2021) and this tool is supposed to be a partial replacement. No?
Regarding 2) it's true that shells differ, but that's exactly my point. It's hard enough as it is to juggle between all sorts of more-or-less-POSIX-y shells (backslash), NT script (circumflex) and Powershell (backtick) regarding escape characters. But the tool does already use its own globbing totally independent of the shell used to run it. So why is it alright to use the Python-internal globbing (codemod) or Rust-internal globbing (fastmod) rather than deferring it to the shell, but then it's not okay to gloss over other portability issues such as those non-printable characters within substitutions? Together with the fact that codemod seems to be no longer developed, this leaves me flabbergasted.
I guess when you're in the comfortable position of using some POSIX-y shell (which I am not 100% of my time), then you don't care if the shell or fastmod itself expands the \n
or \r
or \t
etc. But if you juggle between these, then it makes sense to have one uniform way to express it.
Please advise: does it make more sense to run a friendly fork of fastmod, or are you at all willing to accept outside contributions at all (and what are the constraints in that case, other than signing the CLA). Obviously there are some which are pretty much "between the lines".
are you at all willing to accept outside contributions at all
yes, we accept outside contributions.
what are the constraints in that case
I believe I've discussed above what the contribution I would be happy to accept would look like. Philosophically, I don't think it should be the responsibility of every command-line program to re-implement features that should be provided by the user's shell.
does already use its own globbing
the tool does recursive directory traversal, which is beyond simple globbing.
also, #42 looks OK except for the review comments I left, but hasn't been updated since.
the tool does recursive directory traversal, which is beyond simple globbing.
Wait, I think you misunderstood that part. Yep, I get the point about the traversal, but the --glob
and --iglob
options already provide a functionally similar mechanism to shell globbing even on Windows for each file of the traversed directories. And to get that you have to actually prevent the (Unix) shell from expanding those globs prior to fastmod
seeing them. That's what I was referring to above (and which was absent in codemod
, no? ... i.e. --iglob
and --glob
).
also, #42 looks OK except for the review comments I left, but hasn't been updated since.
@swolchok Sorry, must have totally missed that. My apologies. But I am also still unable to see any review comments 😬. Did you actually post them or is the review still in-progress and therefore only visible to you? Here's what I see on #42:
Feel free to skip and ignore the points below. I just wanted to convey that I totally get your point. There are a lot of things that are odd on either platform, in my opinion. But since the tool is otherwise cross-platform, its users would benefit from the ability to tap into that functionality despite their shell.
I believe I've discussed above what the contribution I would be happy to accept would look like. Philosophically, I don't think it should be the responsibility of every command-line program to re-implement features that should be provided by the user's shell.
Got that point, and generally I even agree with your opinion. But it's kind of a moot point for a shell (cmd.exe
is the successor of command.com
) that has been around much longer than several shells in the Unix realm, including Bash (even cmd.exe
predates Bash), and has its own conventions and places considerable emphasis on backwards compatibility. So it's a case of "can't teach an old dog new tricks", I reckon (hope this isn't totally out of place as a metaphor here; I'm not a native speaker of English).
This is one of the bigger differences between how Unix shells do things and cmd.exe
and its predecessor command.com
as well as PowerShell do it. I have found that even long-term Unix users are oblivious of the fact that it is the shell that does the globbing prior to some command being executed by them, whereas on Windows each individual program does the handling. It even catches up to Unix users at times when they start attempting to pass globs not via system()
but execv()
or similar, because then without the intermediate sh -c
they find themselves at odds about their expectations regarding globs (and with sh -c
you'll often ask for trouble security-wise unless whatever gets passed is thoroughly vetted before it's passed to the shell).
MSVC offers a so-called link option (not to be confused with linker command line options) which does the command-line argument wildcard expansion (again according to the ancient rules inherited from DOS with only minor adjustments; i.e. also different from shell globbing) before your main()
gets to see those in argv
:
setargv.obj
Enables command-line argument wildcard expansion. See Expanding wildcard arguments.
What's worse, though, on Linux you can pass usually 2 MiB (ARG_MAX
) worth of command line arguments and then xargs
or find -exec {} +
exist to overcome that limit. On Windows the overall size of a command line is vastly more limited by cmd.exe
(docs). So, even if cmd.exe
behaved more like a POSIX shell in terms of globbing, you'd soon run into the next limitation with that. Contrary to popular believe, however, the situation with the length of file paths is the inverse (~32k characters on Windows, usually ~4k on Linux ... getconf -a|grep PATH_MAX
) and I have many times cursed at my Linux installations, especially when using ecryptfs
which causes unduly long path names to be created. But I'll readily admit that Windows makes using these longer paths unnecessarily difficult 😬
So my point -- and mind you, my main systems at home are all Linux-based (only at work Windows is the main system), but I still care somewhat about my Windows tooling -- was that by saying "the shell should" you're about 44 years late if we look at command.com
from which this behavior originates or 37 years late if we take cmd.exe
and it would be nice to be able to get that into fastmod
builds for Windows.
I guess the misunderstanding on my end was that it looked as if you weren't interested in merging the changes and I can't see those review comments you mention -- even at the time of writing.
A missing functionality (if I'm not mistaken) is the ability to add a line. Good use case would be that I want to add a 'feature'. For most of the code, a new line needs to be added similar to an existing feature.
One way to solve this would be to allow adding (\r)\n characters like
fastmod -m '(^[\s]*)(feature4\n)' '${1}${2}${1}feature5\n'
But
\n
is just printed raw as\
+n