Review DATA parsing code for SMTP smuggling attack vectors

albertito commented 11 months ago

There is a new SMTP smuggling attack published recently.

It involves the server incorrectly parsing the end of DATA marker \r\n.\r\n, which an attacker can exploit to impersonate a server when email is transmitted server-to-server.

There's also more information about it in the Postfix SMTP smuggling page.

Since the attack is already public, I'm filing this issue to track doing a review of chasquid's DATA parsing code to make sure it is not exposed to the problem, and if so, fix it.

I will do a review today, and will update this issue once I know more.

Thanks to Lorenz Schori (@znerol) for notifying me about this!

znerol commented 11 months ago

Looks like most of the parsing is done by net/txtproto Dotreader. I only found one relevant issue in the go tracker: golang/go#9781 It was closed with the following remark:

it looks like we're following the strict rules from RFC 5321. I'm going to close this without making any changes. The documentation seems accurate too.

If this is still the case, then DotReader should be safe.

albertito commented 11 months ago

I think chasquid is vulnerable to the attack because neither net/textproto.Reader.DotReader nor net/mail.ReadMessage enforce the "LR and CF must only appear together" rule.

Some (not polished and not comprehensive) examples: https://go.dev/play/p/pZNkk-FMwp-

This is based on a quick initial analysis; it is possible I'm missing something. I will continue looking at this today.

znerol commented 11 months ago

Oh! Same time different outcome. I have to confess, that my observations are based purely on looking at docs and issues, not at the code itself.

albertito commented 11 months ago

Looks like most of the parsing is done by net/txtproto Dotreader. I only found one relevant issue in the go tracker: golang/go#9781 It was closed with the following remark:

it looks like we're following the strict rules from RFC 5321. I'm going to close this without making any changes. The documentation seems accurate too.

If this is still the case, then DotReader should be safe.

Thanks for finding that!

I think the messages and examples from the issue are not consistent with the message closing it: one person is saying "RFC says this behaviour is invalid, and here are some examples", and then it is closed with "we are compliant to the RFC so closing" (obviously paraphrasing both).

So I think the implementations are too lax and don't enforce the "\r or \n must only appear together", and that original issue was closed incorrectly.

I also did some preliminary integration tests and they seem to confirm this too.

But I will look a bit more in depth after lunch, it is very possible I'm missing something that makes this okay.

ThinkChaos commented 11 months ago

Thanks for tackling this so quick! I'm glad to be using your software :)

albertito commented 11 months ago

Just to keep folks updated: I've written a bunch of test cases to confirm the issue, and now I'm working on a wrapper that enforces strict \r\n in messages, as per the RFC.

That way we can continue to use the standard library for parsing dot-terminated sections and email messages, but still be strict about newlines in them.

While I don't necessarily love the approach, it is practical, and I hope it is robust enough for now.

I'll post another update once I have something more concrete.

albertito commented 11 months ago

Commit 606c392523bc22355a660f0ac6eb5e8cb1e86cda (in the next branch) has a preliminary fix.

It still needs more testing, which I will do tomorrow.

znerol commented 11 months ago

Thanks a lot @albertito, commit 606c392523bc22355a660f0ac6eb5e8cb1e86cda looks promising. While reading through the new readUntilDot func, I noticed the following:

The new readUntilDot func is much simpler than the stdlib DotReader.Read func. I definitely have less trouble following the code and the state flow.
The new readUntilDot func doesn't consume CR characters, it returns any CR-LF sequence unmodified. The stdlib DotReader.Read func does consume CR characters. It only returns LF line endings.
The new readUntilDot func does consume a leading period (dot stuffing). Same with the stdlib DotReader.Read func.

If observation 2. is correct, then the internal representation of a message body changes with the fix. Will that have an impact on post-data hook implementations? Will it have an impact on mda-lmtp?

znerol commented 11 months ago

If observation 2. is correct, then the internal representation of a message body changes with the fix. Will that have an impact on post-data hook implementations? Will it have an impact on mda-lmtp?

I assume that the tools invoked from the post-data hook will expect to be operating on the body of a mime message. RFC 2046 section 4.1.1 specifies that:

The canonical form of any MIME "text" subtype MUST always represent a line break as a CRLF sequence.

Passing the email body to post-data with CRLF line endings is actually better aligned with standards.

znerol commented 11 months ago

If observation 2. is correct, then the internal representation of a message body changes with the fix. Will that have an impact on post-data hook implementations? Will it have an impact on mda-lmtp?

mda-lmtp uses stdlib net/texproto DotWriter. That caters for content in both forms (NL as well as CRNL). So that should be good as well.

znerol commented 11 months ago

Hm, I think addReceivedHeader and envelope.AddHeader might need some attention as well. It looks like this funcs are adding NL terminated headers instead of CRNL terminated. If that is true, then chasquid instances will have trouble talking to each other after the fix (and to postfix as well as soon as they roll out smtpd_forbid_bare_newline ). This is because the receiver will drop the connection as soon as it encounters a solitary NL.

albertito commented 11 months ago

Thanks a lot @albertito, commit 606c392 looks promising. While reading through the new readUntilDot func, I noticed the following:

The new readUntilDot func is much simpler than the stdlib DotReader.Read func. I definitely have less trouble following the code and the state flow.

Thank you! Some parts are a bit hacky, and I think it's likely that it can be simplified/made more readable, but that can be iterated on later.

The new readUntilDot func doesn't consume CR characters, it returns any CR-LF sequence unmodified. The stdlib DotReader.Read func does consume CR characters. It only returns LF line endings. If observation 2. is correct, then the internal representation of a message body changes with the fix. Will that have an impact on post-data hook implementations? Will it have an impact on mda-lmtp?

This is correct. I thought about keeping the old behaviour, but then if other tools (like MDAs) get more strict in the future, then it's likely we will need to end up doing it anyway.

So my current thinking is to just bite the bullet and update it.

The impact is smaller than it may seem, because both SMTP courier and mda-lmtp already normalize newlines to \r\n, so there shouldn't be externally observable behaviour differences.

Hooks, and other MDAs, will observe a difference in newlines. While the new behaviour is more compliant (as you noted in the other comment), it is still a significant change.

The new readUntilDot func does consume a leading period (dot stuffing). Same with the stdlib DotReader.Read func.

Yeah, that is important for preserving the message structure. While this behaviour was tested, the test was a bit on the sides, so now I've added more explicit end-to-end tests for the dot stuffing just in case.

Hm, I think addReceivedHeader and envelope.AddHeader might need some attention as well. It looks like this funcs are adding NL terminated headers instead of CRNL terminated.

This is true, and also DSNs are internally \n-terminated which now creates some inconsistencies. I am working on clearing these cases too.

If that is true, then chasquid instances will have trouble talking to each other after the fix (and to postfix as well as soon as they roll out smtpd_forbid_bare_newline ). This is because the receiver will drop the connection as soon as it encounters a solitary NL.

That's not the case: even if we internally still have some \n-terminated lines, the SMTP courier will normalize the newlines; because of that, there are no issues communicating with other strict servers (chasquid or postfix).

This is well covered by the integration tests (which have chasquid talk to each other).

I will post an updated patch that improves testing, although there are no significant code changes from the previous one.

My next step is to clear up the known internal \n-terminated lines just to reduce potential friction/confusion. I will look at the hook and the local MDA courier to see if it's worth doing the newline normalization there too, although I am not sure it will be needed.

Thank you!

albertito commented 11 months ago

My next step is to clear up the known internal \n-terminated lines just to reduce potential friction/confusion. I will look at the hook and the local MDA courier to see if it's worth doing the newline normalization there too, although I am not sure it will be needed.

FYI, I've done a pass of "internal representation uses CRLN terminated lines" (fixing up DSN generation and those headers), and it works fine.

However, after thinking about this for a bit more, I think doing a CRLN normalization at endpoints with external systems (SMTP courier (preexisting) and MDA courier (missing)) is still the right call.

And in that case then maybe it makes sense to keep the internal representation LN terminated, because it keeps things a bit simpler and easier to work with/less accident prone.

I'm working on an alternative set of patches so we can compare the two approaches. I will post references once I have something in reasonable shape.

Sorry this is taking so long, it's unfortunate that we got zero warning about the problem, but I'd rather take a bit more time to get the fix as right as possible.

znerol commented 11 months ago

And in that case then maybe it makes sense to keep the internal representation LN terminated, because it keeps things a bit simpler and easier to work with/less accident prone.

That's fine. Also a smaller patch = less risk.

Sorry this is taking so long, it's unfortunate that we got zero warning about the problem, but I'd rather take a bit more time to get the fix as right as possible.

There is no need to be sorry and there is no need to rush! Thanks a lot for your work.

albertito commented 11 months ago

The next branch has the latest iteration of the fixes.

It keeps an LF-terminated internal representation, so the patch is less intrusive, and there's less chances of internal problems or mixed termination styles. It enforces CRLF-terminated lines on external endpoints: SMTP courier already had them, this patch adds it to the MDA courier.

Doing the enforcement at endpoints should help prevent any future issues arising from potentially inconsistent internal newlines.

Hooks continue to get LF-terminated messages, so there are less chances of regressions in preexisting deployments. Also output from hooks (which are usually LF-terminated) continues to be well handled.

I'm going to run this version in my development server for a bit, do some external-world tests; and report back.

For comparison, I uploaded a new branch crlf-everywhere which does the internal conversion to CRLF, and skips the enforcement in the MDA courier. I'll eventually remove it, but it can be useful if anyone wants to compare the two approaches and have an opinion about it.

Thank you!

znerol commented 11 months ago

It keeps an LF-terminated internal representation, so the patch is less intrusive, and there's less chances of internal problems or mixed termination styles. It enforces CRLF-terminated lines on external endpoints: SMTP courier already had them, this patch adds it to the MDA courier.

Comparing the next branch with crlf-everywhere it is quite obvious that keeping the internal representation LF-only is less risky. If that internal representation should be changed in the future, then it likely would make sense to introduce a new data type for guaranteed-to-be-CRLF-terminated-text and use that throughout the project. That way inconsistencies could be detected during compile time.

I went through the patch in next again and left some comments. I'm neither a professional in email protocols nor in golang though, so judge them with a grain of salt.

albertito commented 11 months ago

Thanks @znerol and @ThinkChaos for your comments and thorough review!

I've responded to the comments and uploaded an amended patch 34d52ca85074ea323cc8cd86ae041fef2a690315.

I really appreciate the review, so if you have a chance, please take another look and we can keep iterating on this.

Thank you!

ThinkChaos commented 11 months ago

Everything looks fine to me!

The following is not critical at all and I didn't see any bugs in the code, just thought I'd write it down as "might be nice for the future".
Generally errors could be handled more idiomatically, which would help be more refactor and error proof.
Here's a couple specific points:

Use errors.Is(err, x) instead of err == x so that introducing a new abstraction that wraps errors doesn't break those checks
When using fmt.Errorf to wrap an error, %w should be preferred to %v as it preserves the actual error and not just a string. That way errors.Is/As/Unwrap continue working.
I know linters can catch this, so maybe worth looking into (golangci is what most use, maybe go vet is enough).

Avoid returning a special value, such as code < 0 in this patch, as it's easy to miss. Having an explicit error return value instead forces the caller to check it. Using an error also enables returning data. Example for the code case again in pseudo-go:

type response struct{ Code int; Msg string }
type responseError struct{ e error; r response } // Would need methods for Is/As/Unwrap

fn (c *Conn) Handle() {
  r, err := c.DATA()
  if err != nil {
      var rErr *responseError
      if errors.As(err, &rErr) {
          c.writeResponse(rErr.r)
      }

      return err
  }

  c.writeResponse(r)
}

fn (c *Conn) DATA() (response, error) {
  data, err := read()
  if err != nil {
      return responseError{err, response{521, "..."}}
  }

  return response{250, "..."}
}

albertito commented 11 months ago

Everything looks fine to me!

The following is not critical at all and I didn't see any bugs in the code, just thought I'd write it down as "might be nice for the future". Generally errors could be handled more idiomatically, which would help be more refactor and error proof. Here's a couple specific points:

Thanks for taking the time to think about this and write it down!

Use errors.Is(err, x) instead of err == x so that introducing a new abstraction that wraps errors doesn't break those checks

Thanks, this is a good point. A lot of the chasquid code predates errors.Is and there are many cases where it can be used, and doing a review pass is a great idea.

I think sometimes in tightly coupled code, == can be more practical (since it indicates "this is a specific error"), but that is a very narrow situation and in general I agree with you.

In this particular patch, the error returned could include e.g. the line number and the type of error so it can be added to the trace by the DATA function.

I didn't want to do this right now as it would add more complexity to this patch that's already fairly big and sensitive, but it's definitely something to add later.

When using fmt.Errorf to wrap an error, %w should be preferred to %v as it preserves the actual error and not just a string. That way errors.Is/As/Unwrap continue working. I know linters can catch this, so maybe worth looking into (golangci is what most use, maybe go vet is enough).

This is just an artifact of the code predating %w, and I agree with you. It's something to clean up/bring more up to date alongside the above.

Avoid returning a special value, such as code < 0 in this patch, as it's easy to miss. Having an explicit error return value instead forces the caller to check it. Using an error also enables returning data.

Totally! This code < 0 thing is a big hack!

Until this point, I thought returning the (code, msg) pair was practical and clear enough, and a custom type wouldn't have added much value.

But now that we want to break the connection this way, I think it definitely merits a custom type in the style you suggest.

I don't want to add that now because it means again a fair amount of complexity on an already intrusive change. If we had time, I would do a patch introducing that type first, and then use it in this one

But this issue is fairly pressing so I rather do the little hack, and then clean it up with more time afterwards.

I have added a couple of TODOs based on your suggestions, so I don't forget :)

Thanks again!

znerol commented 11 months ago

I went through the tests once more. They are suitable to cover the tricky cases. I especially appreciate the fact that there are integration tests which exercise a combination of error states (message too large plus invalid newline). Good thinking here.

Thanks a lot @albertito for the quick and at the same time thorough update.

albertito commented 11 months ago

Thanks again for the detailed and thorough reviews!

Things look stable, and I haven't identified any issue in my (unfortunately short) tests in the wild, so I am preparing a new release.

albertito commented 11 months ago

chasquid v1.13 has been released with the fix.

Thanks again for all the help, discussions and reviews through fixing this!

albertito / chasquid

Review DATA parsing code for SMTP smuggling attack vectors #47