Closed hdwalters closed 4 days ago
We have the support for default value for function parameters in the alpha in development.
I thin that if we proceed with this we need a way to support all the sed
available around.
if we proceed with this we need a way to support all the
sed
available around.
I realise there may be sed
implementations which do not support the -E
option, but can you name any which use a different option for the same functionality? AIUI -E
is pretty universal where extended regexes are supported.
I would respectfully suggest that if anyone does have a non standard sed
, they can just pass false
in the new parameter, and use the standard regex syntax.
We have the support for default value for function parameters in the alpha in development.
Great! Can you point me at an example please?
So this is not a breaking change after all:
pub fun replace_regex(source: Text, pattern: Text, replacement: Text, extended: Bool = false): Text {
unsafe {
if extended:
return $echo "{source}" | sed -E -e "s/{pattern}/{replacement}/g"$
else:
return $echo "{source}" | sed -e "s/{pattern}/{replacement}/g"$
}
}
iirc, sed's syntax differs a bit with gnu and bsd versions. im pretty sure alpine has its own stripped down sed too. the bsd version is used on all macOS systems, and gnu is on most linux ones
yup, i was right about alpine:
➜ ~ docker run -it alpine:3.20 ash
/ # sed --help
BusyBox v1.36.1 (2024-06-10 07:11:47 UTC) multi-call binary.
Usage: sed [-i[SFX]] [-nrE] [-f FILE]... [-e CMD]... [FILE]...
or: sed [-i[SFX]] [-nrE] CMD [FILE]...
-e CMD Add CMD to sed commands to be executed
-f FILE Add FILE contents to sed commands to be executed
-i[SFX] Edit files in-place (otherwise write to stdout)
Optionally back files up, appending SFX
-n Suppress automatic printing of pattern space
-r,-E Use extended regex syntax
If no -e or -f, the first non-option argument is the sed command string.
Remaining arguments are input files (stdin if none).
/ #
i think we could check sed
version like this:
version=$(sed --version 2>&1)
if echo $version | grep 'Copyright (C) ' | grep ' Free Software Foundation, Inc.' 2>&1 > /dev/null; then
echo is gnu
fi
if echo $version | grep 'This is not GNU sed version ' 2>&1 > /dev/null; then
echo is alpine (busybox)
fi
if echo $version | grep 'sed: illegal option -- -' 2>&1 > /dev/null; then
echo is bsd
fi
My proposal is to default to -E
in the absence of more definitive knowledge, so I do not think we need to test explicitly for Alpine sed
, as it will fall under that default. This applies to BSD sed
as well, as the current version supports -E
(see https://man.freebsd.org/cgi/man.cgi?sed).
So I think all we really need to test for is "GNU or not GNU"; we might as well simplify the generated Bash output, especially as Mte90 requested that we test the flavour each time replace_regex()
is called.
I am happy to search for \bCopyright\b.+\bFree Software Foundation\b
, but I suspect that \bGNU sed\b
will be slightly faster, with little risk of false positives.
I do also wonder how standardised the actual extended regex syntax is across flavours, but quite honestly it's going to be a losing game trying to correct for any disparities, and I think we can reasonably leave that to Amber script authors (it might be handy to provide a way of testing the OS type, but that's a topic for another day).
Tested on MacOS:
xxx@xxx-MacBook-Air-2 ~ % echo 'abc123def' | sed -E -e 's/([0-9]+)/[\1]/'
abc[123]def
This applies to BSD
sed
as well, as the current version supports-E
(see https://man.freebsd.org/cgi/man.cgi?sed).
they do not support the same regexes, and their engines differ a little bit
they do not support the same regexes, and their engines differ a little bit
I suspected as much, but I really don't think there's a lot we can do about masking those differences. After all, script authors would have to tailor the regex syntax if they were calling sed
from a Bash script.
I note that standard library function
replace_regex()
transpiles to the followingsed
command:sed
basic regex syntax is horrible, as you have to unexpectedly (IMHO) escape regex tokens like capturing parentheses(...)
, and it doesn't support the "one or more instances" token+
. I therefore usesed -E
, and it would be nice to have that option.I believe the aim of Amber is to be as widely applicable as possible, and I understand that some Bash distributions do not support
sed -E
or the equivalentsed -r
, so we can't necessarily just change it in the standard library. I would therefore like to propose a breaking change:It may even be possible to provide default parameters to Amber functions, but I do not believe this is currently implemented.
Thoughts?