Closed albertito closed 11 months ago
Looks like most of the parsing is done by net/txtproto
Dotreader. I only found one relevant issue in the go tracker: golang/go#9781 It was closed with the following remark:
it looks like we're following the strict rules from RFC 5321. I'm going to close this without making any changes. The documentation seems accurate too.
If this is still the case, then DotReader
should be safe.
I think chasquid is vulnerable to the attack because neither net/textproto.Reader.DotReader nor net/mail.ReadMessage enforce the "LR and CF must only appear together" rule.
Some (not polished and not comprehensive) examples: https://go.dev/play/p/pZNkk-FMwp-
This is based on a quick initial analysis; it is possible I'm missing something. I will continue looking at this today.
Oh! Same time different outcome. I have to confess, that my observations are based purely on looking at docs and issues, not at the code itself.
Looks like most of the parsing is done by
net/txtproto
Dotreader. I only found one relevant issue in the go tracker: golang/go#9781 It was closed with the following remark:it looks like we're following the strict rules from RFC 5321. I'm going to close this without making any changes. The documentation seems accurate too.
If this is still the case, then
DotReader
should be safe.
Thanks for finding that!
I think the messages and examples from the issue are not consistent with the message closing it: one person is saying "RFC says this behaviour is invalid, and here are some examples", and then it is closed with "we are compliant to the RFC so closing" (obviously paraphrasing both).
So I think the implementations are too lax and don't enforce the "\r or \n must only appear together", and that original issue was closed incorrectly.
I also did some preliminary integration tests and they seem to confirm this too.
But I will look a bit more in depth after lunch, it is very possible I'm missing something that makes this okay.
Thanks for tackling this so quick! I'm glad to be using your software :)
Just to keep folks updated: I've written a bunch of test cases to confirm the issue, and now I'm working on a wrapper that enforces strict \r\n
in messages, as per the RFC.
That way we can continue to use the standard library for parsing dot-terminated sections and email messages, but still be strict about newlines in them.
While I don't necessarily love the approach, it is practical, and I hope it is robust enough for now.
I'll post another update once I have something more concrete.
Commit 606c392523bc22355a660f0ac6eb5e8cb1e86cda (in the next
branch) has a preliminary fix.
It still needs more testing, which I will do tomorrow.
Thanks a lot @albertito, commit 606c392523bc22355a660f0ac6eb5e8cb1e86cda looks promising. While reading through the new readUntilDot
func, I noticed the following:
readUntilDot
func is much simpler than the stdlib DotReader.Read
func. I definitely have less trouble following the code and the state flow.readUntilDot
func doesn't consume CR characters, it returns any CR-LF sequence unmodified. The stdlib DotReader.Read
func does consume CR characters. It only returns LF line endings.readUntilDot
func does consume a leading period (dot stuffing). Same with the stdlib DotReader.Read
func.If observation 2. is correct, then the internal representation of a message body changes with the fix. Will that have an impact on post-data
hook implementations? Will it have an impact on mda-lmtp
?
If observation 2. is correct, then the internal representation of a message body changes with the fix. Will that have an impact on post-data hook implementations? Will it have an impact on mda-lmtp?
I assume that the tools invoked from the post-data
hook will expect to be operating on the body of a mime message. RFC 2046 section 4.1.1 specifies that:
The canonical form of any MIME "text" subtype MUST always represent a line break as a CRLF sequence.
Passing the email body to post-data
with CRLF line endings is actually better aligned with standards.
If observation 2. is correct, then the internal representation of a message body changes with the fix. Will that have an impact on post-data hook implementations? Will it have an impact on mda-lmtp?
mda-lmtp
uses stdlib net/texproto
DotWriter. That caters for content in both forms (NL
as well as CRNL
). So that should be good as well.
Hm, I think addReceivedHeader and envelope.AddHeader might need some attention as well. It looks like this funcs are adding NL
terminated headers instead of CRNL
terminated. If that is true, then chasquid
instances will have trouble talking to each other after the fix (and to postfix
as well as soon as they roll out smtpd_forbid_bare_newline ). This is because the receiver will drop the connection as soon as it encounters a solitary NL
.
Thanks a lot @albertito, commit 606c392 looks promising. While reading through the new
readUntilDot
func, I noticed the following:
- The new
readUntilDot
func is much simpler than the stdlibDotReader.Read
func. I definitely have less trouble following the code and the state flow.
Thank you! Some parts are a bit hacky, and I think it's likely that it can be simplified/made more readable, but that can be iterated on later.
- The new
readUntilDot
func doesn't consume CR characters, it returns any CR-LF sequence unmodified. The stdlibDotReader.Read
func does consume CR characters. It only returns LF line endings. If observation 2. is correct, then the internal representation of a message body changes with the fix. Will that have an impact onpost-data
hook implementations? Will it have an impact onmda-lmtp
?
This is correct. I thought about keeping the old behaviour, but then if other tools (like MDAs) get more strict in the future, then it's likely we will need to end up doing it anyway.
So my current thinking is to just bite the bullet and update it.
The impact is smaller than it may seem, because both SMTP courier and mda-lmtp already normalize newlines to \r\n, so there shouldn't be externally observable behaviour differences.
Hooks, and other MDAs, will observe a difference in newlines. While the new behaviour is more compliant (as you noted in the other comment), it is still a significant change.
- The new
readUntilDot
func does consume a leading period (dot stuffing). Same with the stdlibDotReader.Read
func.
Yeah, that is important for preserving the message structure. While this behaviour was tested, the test was a bit on the sides, so now I've added more explicit end-to-end tests for the dot stuffing just in case.
Hm, I think addReceivedHeader and envelope.AddHeader might need some attention as well. It looks like this funcs are adding
NL
terminated headers instead ofCRNL
terminated.
This is true, and also DSNs are internally \n-terminated which now creates some inconsistencies. I am working on clearing these cases too.
If that is true, then
chasquid
instances will have trouble talking to each other after the fix (and topostfix
as well as soon as they roll out smtpd_forbid_bare_newline ). This is because the receiver will drop the connection as soon as it encounters a solitaryNL
.
That's not the case: even if we internally still have some \n-terminated lines, the SMTP courier will normalize the newlines; because of that, there are no issues communicating with other strict servers (chasquid or postfix).
This is well covered by the integration tests (which have chasquid talk to each other).
I will post an updated patch that improves testing, although there are no significant code changes from the previous one.
My next step is to clear up the known internal \n-terminated lines just to reduce potential friction/confusion. I will look at the hook and the local MDA courier to see if it's worth doing the newline normalization there too, although I am not sure it will be needed.
Thank you!
My next step is to clear up the known internal \n-terminated lines just to reduce potential friction/confusion. I will look at the hook and the local MDA courier to see if it's worth doing the newline normalization there too, although I am not sure it will be needed.
FYI, I've done a pass of "internal representation uses CRLN terminated lines" (fixing up DSN generation and those headers), and it works fine.
However, after thinking about this for a bit more, I think doing a CRLN normalization at endpoints with external systems (SMTP courier (preexisting) and MDA courier (missing)) is still the right call.
And in that case then maybe it makes sense to keep the internal representation LN terminated, because it keeps things a bit simpler and easier to work with/less accident prone.
I'm working on an alternative set of patches so we can compare the two approaches. I will post references once I have something in reasonable shape.
Sorry this is taking so long, it's unfortunate that we got zero warning about the problem, but I'd rather take a bit more time to get the fix as right as possible.
And in that case then maybe it makes sense to keep the internal representation LN terminated, because it keeps things a bit simpler and easier to work with/less accident prone.
That's fine. Also a smaller patch = less risk.
Sorry this is taking so long, it's unfortunate that we got zero warning about the problem, but I'd rather take a bit more time to get the fix as right as possible.
There is no need to be sorry and there is no need to rush! Thanks a lot for your work.
The next
branch has the latest iteration of the fixes.
It keeps an LF-terminated internal representation, so the patch is less intrusive, and there's less chances of internal problems or mixed termination styles. It enforces CRLF-terminated lines on external endpoints: SMTP courier already had them, this patch adds it to the MDA courier.
Doing the enforcement at endpoints should help prevent any future issues arising from potentially inconsistent internal newlines.
Hooks continue to get LF-terminated messages, so there are less chances of regressions in preexisting deployments. Also output from hooks (which are usually LF-terminated) continues to be well handled.
I'm going to run this version in my development server for a bit, do some external-world tests; and report back.
For comparison, I uploaded a new branch crlf-everywhere
which does the internal conversion to CRLF, and skips the enforcement in the MDA courier. I'll eventually remove it, but it can be useful if anyone wants to compare the two approaches and have an opinion about it.
Thank you!
It keeps an LF-terminated internal representation, so the patch is less intrusive, and there's less chances of internal problems or mixed termination styles. It enforces CRLF-terminated lines on external endpoints: SMTP courier already had them, this patch adds it to the MDA courier.
Comparing the next
branch with crlf-everywhere
it is quite obvious that keeping the internal representation LF
-only is less risky. If that internal representation should be changed in the future, then it likely would make sense to introduce a new data type for guaranteed-to-be-CRLF
-terminated-text and use that throughout the project. That way inconsistencies could be detected during compile time.
I went through the patch in next
again and left some comments. I'm neither a professional in email protocols nor in golang though, so judge them with a grain of salt.
Thanks @znerol and @ThinkChaos for your comments and thorough review!
I've responded to the comments and uploaded an amended patch 34d52ca85074ea323cc8cd86ae041fef2a690315.
I really appreciate the review, so if you have a chance, please take another look and we can keep iterating on this.
Thank you!
Everything looks fine to me!
The following is not critical at all and I didn't see any bugs in the code, just thought I'd write it down as "might be nice for the future".
Generally errors could be handled more idiomatically, which would help be more refactor and error proof.
Here's a couple specific points:
errors.Is(err, x)
instead of err == x
so that introducing a new abstraction that wraps errors doesn't break those checksfmt.Errorf
to wrap an error, %w
should be preferred to %v
as it preserves the actual error and not just a string. That way errors.Is/As/Unwrap
continue working.go vet
is enough).Avoid returning a special value, such as code < 0
in this patch, as it's easy to miss. Having an explicit error
return value instead forces the caller to check it. Using an error
also enables returning data.
Example for the code
case again in pseudo-go:
type response struct{ Code int; Msg string }
type responseError struct{ e error; r response } // Would need methods for Is/As/Unwrap
fn (c *Conn) Handle() {
r, err := c.DATA()
if err != nil {
var rErr *responseError
if errors.As(err, &rErr) {
c.writeResponse(rErr.r)
}
return err
}
c.writeResponse(r)
}
fn (c *Conn) DATA() (response, error) {
data, err := read()
if err != nil {
return responseError{err, response{521, "..."}}
}
return response{250, "..."}
}
Everything looks fine to me!
The following is not critical at all and I didn't see any bugs in the code, just thought I'd write it down as "might be nice for the future". Generally errors could be handled more idiomatically, which would help be more refactor and error proof. Here's a couple specific points:
Thanks for taking the time to think about this and write it down!
- Use
errors.Is(err, x)
instead oferr == x
so that introducing a new abstraction that wraps errors doesn't break those checks
Thanks, this is a good point. A lot of the chasquid code predates errors.Is and there are many cases where it can be used, and doing a review pass is a great idea.
I think sometimes in tightly coupled code, ==
can be more practical (since it indicates "this is a specific error"), but that is a very narrow situation and in general I agree with you.
In this particular patch, the error returned could include e.g. the line number and the type of error so it can be added to the trace by the DATA function.
I didn't want to do this right now as it would add more complexity to this patch that's already fairly big and sensitive, but it's definitely something to add later.
- When using
fmt.Errorf
to wrap an error,%w
should be preferred to%v
as it preserves the actual error and not just a string. That wayerrors.Is/As/Unwrap
continue working. I know linters can catch this, so maybe worth looking into (golangci is what most use, maybego vet
is enough).
This is just an artifact of the code predating %w
, and I agree with you. It's something to clean up/bring more up to date alongside the above.
- Avoid returning a special value, such as
code < 0
in this patch, as it's easy to miss. Having an expliciterror
return value instead forces the caller to check it. Using anerror
also enables returning data.
Totally! This code < 0
thing is a big hack!
Until this point, I thought returning the (code, msg)
pair was practical and clear enough, and a custom type wouldn't have added much value.
But now that we want to break the connection this way, I think it definitely merits a custom type in the style you suggest.
I don't want to add that now because it means again a fair amount of complexity on an already intrusive change. If we had time, I would do a patch introducing that type first, and then use it in this one
But this issue is fairly pressing so I rather do the little hack, and then clean it up with more time afterwards.
I have added a couple of TODOs based on your suggestions, so I don't forget :)
Thanks again!
I went through the tests once more. They are suitable to cover the tricky cases. I especially appreciate the fact that there are integration tests which exercise a combination of error states (message too large plus invalid newline). Good thinking here.
Thanks a lot @albertito for the quick and at the same time thorough update.
Thanks again for the detailed and thorough reviews!
Things look stable, and I haven't identified any issue in my (unfortunately short) tests in the wild, so I am preparing a new release.
chasquid v1.13 has been released with the fix.
Thanks again for all the help, discussions and reviews through fixing this!
There is a new SMTP smuggling attack published recently.
It involves the server incorrectly parsing the end of DATA marker
\r\n.\r\n
, which an attacker can exploit to impersonate a server when email is transmitted server-to-server.There's also more information about it in the Postfix SMTP smuggling page.
Since the attack is already public, I'm filing this issue to track doing a review of chasquid's DATA parsing code to make sure it is not exposed to the problem, and if so, fix it.
I will do a review today, and will update this issue once I know more.
Thanks to Lorenz Schori (@znerol) for notifying me about this!