acmesh-official / acme.sh

A pure Unix shell script implementing ACME client protocol
https://acme.sh
GNU General Public License v3.0
39.25k stars 4.96k forks source link

What's wrong with this sed regex? #3373

Closed Eagle3386 closed 2 years ago

Eagle3386 commented 3 years ago

For my upcoming 3rd party DNS API plugin, the DNS provider requires re-submission of the full TXT records, so I need to use sed to remove the matching snippet after successful validation. However, my attempt:

txtRecords="$(printf -- "%s" "$txtRecords" | sed 's/\(\\n\)\{0,1\}\('"$domain \\\\\"$token\\\\\""'\)//')"

fails, no matter how many backslashes I add /remove or what regex variations I try - sed just won't remove the snippet. 😞 Can somebody tell me what's wrong with the regex, please?

sahsanu commented 3 years ago

When asking for help using regular expressions you should always provide the original text, that is the value of $txtRecords, $domain and $token and also the expected result.

Also, if you use single quotes around sed as you used, $domain and $token variables won't be expanded, you should use double quotes around sed.

Neilpang commented 3 years ago

On most of the platforms, sed, in normal mode, is working in a line-by-line mode. That means sed can not process \n. it can not process multiple lines.

So, try to avoid \n from the reg ex.

Thanks

Eagle3386 commented 3 years ago

@sahsanu You're absolutely right - I apologize for that. See below for full details:

# NOTE: $txtRecords initially holds >= 0 _literal_ \n which are actual text, _not_ newline & any other \ is a literal one, too
# $txtRecords: _dmarc \"v=DMARC1;p=quarantine;pct=100;rua=mailto:account@example.org;ruf=mailto:account@example.org;adkim=r;aspf=r\"\n_token._dnswl \"numberslike123andlowercaseletters\"\nselector._domainkey \"v=DKIM1; k=rsa; p=somethingLikeAPublicKey/mightEvenContainSlashes/severalTimes\"\n@ \"v=spf1 a mx -all\"
# $domain:     _acme-challenge or _acme-challenge.sub.domains.if.any - but always _without_ example.org
# $token:      Let's Encrypt's actual validation challenge value, e. g.: xENZ9tsFhyITx349QLloidqg8yV1o7Nib6lwV6RuGzQ
txtRecords="$(printf -- "%s" "$txtRecords" | sed 's/\(\\n\)\{0,1\}\('"$domain \\\\\"$token\\\\\""'\)//')"

I included the optional newline at sed's start, because I always append any Let's Encrypt validation record which would lead to TXT records with cluttered empty lines otherwise.

@Neilpang Yes, but as stated above, it's a literal \n, not a newline, & I must include it due to aforementioned reason.

Eagle3386 commented 3 years ago

@sahsanu Would you mind, taking a look at my additionally provided information to help me with the regex, please? It's the only piece preventing me from continuing, i. e. it's the last step prior to convincing Neil regarding addition of my DNS API plugin via a PR..

sahsanu commented 3 years ago

Hello @Eagle3386 ,

Could you please paste a real and complete example including the record for _acme-challenge? If there are several possibilities, please post them too because I'm seeing the original text and records only contain double quotes in the "token" part but in your regex seems there should be a double quote in the domain... really confusing.

Well, here an example with my free interpretation of what could contain the variables.

txtRecords="_dmarc \"v=DMARC1;p=quarantine;pct=100;rua=mailto:account@example.org;ruf=mailto:account@example.org;adkim=r;aspf=r\"\n_acme-challenge.sub.domains \"xENZ9tsFhyITx349QLloidqg8yV1o7Nib6lwV6RuGzQ\"\n_token._dnswl \"numberslike123andlowercaseletters\"\nselector._domainkey \"v=DKIM1; k=rsa; p=somethingLikeAPublicKey/mightEvenContainSlashes/severalTimes\"\n@ \"v=spf1 a mx -all\"\n_acme-challenge \"xENZ9tsFhyITx349QLloidqg8yV1o7Nib6lwV6RuGzQ\""
domain="_acme-challenge.sub.domains"
token="xENZ9tsFhyITx349QLloidqg8yV1o7Nib6lwV6RuGzQ"

And here the sed part:

printf -- "%s" "$txtRecords" | sed -r "s/(\\\n){0,1}$domain\ \x22$token\x22//"

If you want to use (\\\n){0,1} you must use -r parameter (at least on GNU/sed) but in another systems the parameter should be -E, or... so maybe you should launch the sed twice to avoid the use of extended regular expressions.

printf -- "%s" "$txtRecords" | sed "s/\\\n$domain\ \x22$token\x22//"
printf -- "%s" "$txtRecords" | sed "s/$domain\ \x22$token\x22//"

But the second sed, I understad is for the case the record is the first one so in that case should be a \n at the end or if there is only one record maybe there aren't any \n so maybe you should use this instead (in this order):

printf -- "%s" "$txtRecords" | sed "s/\\\n$domain\ \x22$token\x22//"
printf -- "%s" "$txtRecords" | sed "s/$domain\ \x22$token\x22\\\n//"
printf -- "%s" "$txtRecords" | sed "s/$domain\ \x22$token\x22//"

As you see, it is really complicated to show the best approach to resolve the problem without knowing all the possibilities that you can receive when using the txtRecords.

At least I hope this could help you to continue with your sed challenge ;)

Cheers, sahsanu

Eagle3386 commented 3 years ago

@sahsanu Sorry for the late reply - RL didn't let me catch up sooner.

Since I'm still struggling with sed towards Neil's & the DNS API dev guide's requirements about UNIX compatible statements, e. g. avoid GNU extensions, etc., I'm hoping you're still in for helping me out. If so, please find my real world example & what I've tried thus far..

The JSON I'm getting to work with (replaced domain with example.org, IP with 1.2.3.4 & hashes with ABC / ABC123 for brevity, everything else remained as is):

{"data":{"TTL":180,"TLSA":"_25._tcp.mail.example.org. 2 1 1 ABC123\n_25._tcp.mail.example.org. 2 1 1 ABC123\n_25._tcp.mail.example.org. 2 1 1 ABC123\n_25._tcp.mail.example.org. 2 1 1 ABC123\n_25._tcp.mail.example.org. 2 1 1 ABC123\n_465._tcp.mail.example.org. 2 1 1 ABC123\n_465._tcp.mail.example.org. 2 1 1 ABC123\n_465._tcp.mail.example.org. 2 1 1 ABC123\n_465._tcp.mail.example.org. 2 1 1 ABC123\n_465._tcp.mail.example.org. 2 1 1 ABC123\n_587._tcp.mail.example.org. 2 1 1 ABC123\n_587._tcp.mail.example.org. 2 1 1 ABC123\n_587._tcp.mail.example.org. 2 1 1 ABC123\n_587._tcp.mail.example.org. 2 1 1 ABC123\n_587._tcp.mail.example.org. 2 1 1 ABC123","AAAA":"","TXT":"_dmarc \"v=DMARC1;p=quarantine;pct=100;rua=mailto:postmaster@example.org;ruf=mailto:postmaster@example.org;adkim=r;aspf=r\"\r\n_token._dnswl \"ABC123\"\r\naorg._domainkey \"v=DKIM1; k=rsa; p=ABC123/ABC/ABC123/ABC123/ABC123/ABC123\"\r\n@ \"v=spf1 a mx -all\"","A":"blog 1.2.3.4\nmail 1.2.3.4\nwww 1.2.3.4\n@ 1.2.3.4","nameserver":"auth1.example.com.\nauth2.example.com.","SRV":"_imap._tcp 0 0 0 .\r\n_imaps._tcp 0 1 993 mail.example.org.\r\n_pop3._tcp 0 0 0 .\r\n_pop3s._tcp 0 0 0 .","CName":"","MX":"10 mail.example.org.","comment":"TLSA RR -> as http://dnssec-stats.ant.isi.edu/~viktor/x3hosts.html states, LE X3-Root-CA dies soon (see https://letsencrypt.org/2020/09/17/new-root-and-intermediates.html)\r\nHence complete DNS API plugin, switch cert to P-521 & adjust TLSA RR!\r\nTest tool: https://stats.dnssec-tools.org/explore/?example.org && https://dane.sys4.de/smtp/example.org","CAA":"@ 128 iodef \"mailto:me@example.com\"\n@ 128 issue \"letsencrypt.org\""},"status":"OK"}

This is my sed command thus far:

sed -E 's/^(.*TXT":"[^"][^,]*)(.*)$/\1InsertAcmeChallengeValueHere\2/'

Since I can't do a negative lookahead, I'm trying to teach sed that it should catch/grab anything after TXT":", but only up until the ",-combo. But adding [^"] doesn't work as I get the exact same result without it:

Contents of \1:

{"data":{"TTL":180,"TLSA":"_25._tcp.mail.example.org. 2 1 1 ABC123\n_25._tcp.mail.example.org. 2 1 1 ABC123\n_25._tcp.mail.example.org. 2 1 1 ABC123\n_25._tcp.mail.example.org. 2 1 1 ABC123\n_25._tcp.mail.example.org. 2 1 1 ABC123\n_465._tcp.mail.example.org. 2 1 1 ABC123\n_465._tcp.mail.example.org. 2 1 1 ABC123\n_465._tcp.mail.example.org. 2 1 1 ABC123\n_465._tcp.mail.example.org. 2 1 1 ABC123\n_465._tcp.mail.example.org. 2 1 1 ABC123\n_587._tcp.mail.example.org. 2 1 1 ABC123\n_587._tcp.mail.example.org. 2 1 1 ABC123\n_587._tcp.mail.example.org. 2 1 1 ABC123\n_587._tcp.mail.example.org. 2 1 1 ABC123\n_587._tcp.mail.example.org. 2 1 1 ABC123","AAAA":"","TXT":"_dmarc \"v=DMARC1;p=quarantine;pct=100;rua=mailto:postmaster@example.org;ruf=mailto:postmaster@example.org;adkim=r;aspf=r\"\r\n_token._dnswl \"ABC123\"\r\naorg._domainkey \"v=DKIM1; k=rsa; p=ABC123/ABC/ABC123/ABC123/ABC123/ABC123\"\r\n@ \"v=spf1 a mx -all\""

(Notice the double " at the end - my insertion should happen right between them, with line break appended to the first.)

Contents of \2:

,"A":"blog 1.2.3.4\nmail 1.2.3.4\nwww 1.2.3.4\n@ 1.2.3.4","nameserver":"auth1.example.com.\nauth2.example.com.","SRV":"_imap._tcp 0 0 0 .\r\n_imaps._tcp 0 1 993 mail.example.org.\r\n_pop3._tcp 0 0 0 .\r\n_pop3s._tcp 0 0 0 .","CName":"","MX":"10 mail.example.org.","comment":"TLSA RR -> as http://dnssec-stats.ant.isi.edu/~viktor/x3hosts.html states, LE X3-Root-CA dies soon (see https://letsencrypt.org/2020/09/17/new-root-and-intermediates.html)\r\nHence complete DNS API plugin, switch cert to P-521 & adjust TLSA RR!\r\nTest tool: https://stats.dnssec-tools.org/explore/?example.org && https://dane.sys4.de/smtp/example.org","CAA":"@ 128 iodef \"mailto:me@example.com\"\n@ 128 issue \"letsencrypt.org\""},"status":"OK"}

(Notice the , at the start - it should be the last " from \1 instead.)

Additional notes:

  1. As you've already pointed out, there might be no TXT yet, so the regex has to deal with that, too.
  2. On the one hand, I'd like to stay with a one-liner, but on the other hand wouldn't mind inserting ### & then replace that together with its preceding character in order to catch all possible scenarios at once.
  3. Lastly, I should point out that the order of all JSON properties is completely random, i. e. while above's example is data first, status last & TXT before A & comment last, the next API query might return status first, data last & A before comment, TXT last.
Neilpang commented 3 years ago

how about this this one:

t='{"data":{"TTL":180,"TLSA":"_25._tcp.mail.example.org. 2 1 1 ABC123\n_25._tcp.mail.example.org. 2 1 1 ABC123\n_25._tcp.mail.example.org. 2 1 1 ABC123\n_25._tcp.mail.example.org. 2 1 1 ABC123\n_25._tcp.mail.example.org. 2 1 1 ABC123\n_465._tcp.mail.example.org. 2 1 1 ABC123\n_465._tcp.mail.example.org. 2 1 1 ABC123\n_465._tcp.mail.example.org. 2 1 1 ABC123\n_465._tcp.mail.example.org. 2 1 1 ABC123\n_465._tcp.mail.example.org. 2 1 1 ABC123\n_587._tcp.mail.example.org. 2 1 1 ABC123\n_587._tcp.mail.example.org. 2 1 1 ABC123\n_587._tcp.mail.example.org. 2 1 1 ABC123\n_587._tcp.mail.example.org. 2 1 1 ABC123\n_587._tcp.mail.example.org. 2 1 1 ABC123","AAAA":"","TXT":"_dmarc \"v=DMARC1;p=quarantine;pct=100;rua=mailto:postmaster@example.org;ruf=mailto:postmaster@example.org;adkim=r;aspf=r\"\r\n_token._dnswl \"ABC123\"\r\naorg._domainkey \"v=DKIM1; k=rsa; p=ABC123/ABC/ABC123/ABC123/ABC123/ABC123\"\r\n@ \"v=spf1 a mx -all\"","A":"blog 1.2.3.4\nmail 1.2.3.4\nwww 1.2.3.4\n@ 1.2.3.4","nameserver":"auth1.example.com.\nauth2.example.com.","SRV":"_imap._tcp 0 0 0 .\r\n_imaps._tcp 0 1 993 mail.example.org.\r\n_pop3._tcp 0 0 0 .\r\n_pop3s._tcp 0 0 0 .","CName":"","MX":"10 mail.example.org.","comment":"TLSA RR -> as http://dnssec-stats.ant.isi.edu/~viktor/x3hosts.html states, LE X3-Root-CA dies soon (see https://letsencrypt.org/2020/09/17/new-root-and-intermediates.html)\r\nHence complete DNS API plugin, switch cert to P-521 & adjust TLSA RR!\r\nTest tool: https://stats.dnssec-tools.org/explore/?example.org && https://dane.sys4.de/smtp/example.org","CAA":"@ 128 iodef \"mailto:me@example.com\"\n@ 128 issue \"letsencrypt.org\""},"status":"OK"}'

echo "$t" |  tr ',' '\n' | grep '^"TXT":'
Eagle3386 commented 3 years ago

@Neilpang That has 2 downsides:

  1. Since my provider's API response already contains \n characters, using your suggestion would make it impossible to distinguish between "original" \n and those added by my sed command. That's because my provider requires me to POST the whole TXT block back to their API, i. e. everything between "TXT":" & ", (or "}, if the random output order put the TXT block at the end).
  2. It still returns -all\"" at the end which my current solution already does - but your suggestion would imply the additional downside mentioned above.

Instead, the last ", followed by either , or } must not be caught/grabbed/selected, so that I can then insert _acme...[challenge value] & then continue with ", (or "} respectively) up until the end of the original API response's content.

Eagle3386 commented 2 years ago

Kinda "resurrection", I know, but just for the sake of completeness (maybe there are other DNS API out there, enforcing such awful string slicing, too), here's how I succeeded (API will be submitted via PR soon 😉):

# Extract TXT part, strip trailing quote sign (ACME.sh API guidelines forbid
# usage of SED's GNU extensions, hence couldn't omit it via regex), strip '\'
# from '\"' & turn '\n' into real LF characters.
# Yup, awful API to use - but that's all we got to get this working, so... ;)
_debug2 'Raw  ' "$response"
response="$(printf -- '%s' "$response"
  \ | sed 's/^\(.*TXT":"\)\([^,}]*\)\(.*\)$/\2/;s/.$//;s/\\"/"/g;s/\\n/\n/g')"
_debug2 'Clean' "$response"