Domain-Connect / spec

Domain Connect Specification
MIT License
71 stars 34 forks source link

Concerns regarding "-all" as a default policy on SPF #46

Open knoepfchendruecker opened 4 years ago

knoepfchendruecker commented 4 years ago

Default SPF policy of "fail"

6.10.3 advises always to use a fixed modifier of "-" (fail) for any messages from other sources not specified in SPF rules. When two services with different policies are being merged via SPFM, they do result in having "-all" as a default policy.

6.10.3 also states "it" (merged SPF record or just the default policy?) can always be modified by the user after the merge operation is completed.

After years of experience with SPF, this advisory is quite worrying, as it puts a troublesome default into place.

Issues with forwarding (SPF, SRS, DMARC)

Mails may be forwarded by generic mail forwarding services and mailing lists, who don't rewrite rfc5321.MailFrom. As the sender's SPF record doesn't list those forwarding hosts, the final receiving host will apply the default policy: to reject those forwarded messages.

The experimental SRS (Sender Rewriting Scheme) is a method to rewrite the address in rfc5321.MailFrom. An SRS-compliant forwarding host will encode the original address in the localpart and append a SRS-specific, SPF-whitelisted domain to it. From the receiving host's point of view, the sender address meets the SPF record of the (SRS-specific) domain and so the message is to be accepted.

In real life, SRS has a very low adoption rate due to complexity and further issues.

SRS is also incompatible with DMARC's requirement on the identifier alignment: in DMARC's "strict" mode, both domains from rfc5321.MailFrom and rfc5322.From need to match exactly; in DMARC's relaxed mode, the rfc5321.MailFrom domain must be at least a subdomain of the domain from rfc5322.From.

Many mail servers do evaluate SPF records during the SMTP dialogue and reject after "MAIL FROM" commands, while DMARC is evaluated after the message has been received later during the SMTP dialogue. A "fail" default policy can result in situations where a forwarded message won't have a chance of being accepted by the receiving mail server:

Using a less restrictive default policy like softfail (~all) or neutral (?all), the un-rewritten message could've passed the initial SPF check.

One might also argue to prefer "neutral": RFC7208 specifies the lack of an "all" mechanism to be interpreted as "neutral":

If none of the mechanisms match and there is no "redirect" modifier, then the check_host() returns a result of "neutral", just as if "?all" were specified as the last directive.

Contradicting example

6.10.2 gives an example where multiple SPF records with different default policies (~all and -all) are manually being merged. The result uses the "least restrictive all modifier" as a new default policy of the SPF record and advocates "-all" to be more appropriate when no other services are being used.

This merge strategy makes much more sense to me, as it does prefer the "more compatible" default policy rather than the "most strict" policy.

Other antispam engines

From perspective of different spam filtering engines like SpamAssassin, there's not much difference between a softfail and a fail default policy, yet there is often a strong difference if SPF is being evaluated by an MTA. So after all, a softfail (~all) default policy seems to be a much more reasonable default for most users, as it does avoid forwarded messages from being rejected at "MAIL FROM" time without risking compliance on other standards like DMARC. When SPF and DMARC are being evaluated at the same time, an enforced DMARC policy (p=quarantine, p=reject) overrides any SPF default policies, so having a strict SPF-encoded default policy is even less required.

Suggestions

I do see a few points to address this topic.

Also noteworthy: RFC7208 in appendix A.4 contains an example making use of "+all" (pass) for a restrictive policy (by negating other records, including a deprecated "ptr" method). I do have serious doubt such a record could be successfully merged with any "more common" SPF record. I haven't seen such an SPF record in real life, but I've seen quite a few "+all" or "all" records, which result in an insecure configuration.

It's probably reasonable to reject merging any SPFM record mentioning the "all" mechanism with either an explicitly or implicitly passing modifier ("+all,"all").

pawel-kow commented 4 years ago

So long story short you argument that:

knoepfchendruecker commented 4 years ago

@pawel-kow Exactly. Sorry for providing a too much elaborated description.

arnoldblinn commented 4 years ago

@pawel-kow This was my conclusion too. The thoughts seemed reasonable. @knoepfchendruecker Yes, it was elaborate :-).

pawel-kow commented 4 years ago

OK, I proposed the change in https://github.com/Domain-Connect/spec/pull/47 @arnoldblinn @knoepfchendruecker please review

arnoldblinn commented 4 years ago

I was travelling last week, and didn't have a chance to dig into this.

The upshot of this discussion is to change the "all" modifer on a merged spf record from a - to a ~. I don't have a problem with this per-se. However, digging through the long conversation....

Section 6.10.2 is a description of how people might manually merge records. Here we say use the least restrictive approach. I think everyone agrees this makes sense when manually dealing with SPF records.

But section 6.10.3 is a description of how the SPFM record should be merged into a final SPF record. The SPFM does NOT contain a rule for the "all". The whole point of this was to eliminate this complexity for the services and for the user.

We originall picked -all. The thought process when originally writing this being that any provider using Domain Connect for this functionality would have an all inclusive rule.

We can change this to ~all without much difficultly. But bear in mind this is not taking a "least restrictive on merging records" approach. This is hard coding a default modifier for all. The SPFM record does NOT specify the desired modifier for the all rule.

I looked at the templates that use the SPFM record and searched online for their recommended "manual" settings:

These guys are all over the map; and I suspect it is the conservative nature of mailjet and plesk.

Note that Google isn't using the spfm record in the templates, and they use a -all. Although searching has some help articles that recommend a ~all. Again, inconsistent.

I'm happy to pick a default rule of ~all. But the rationale behind the -all was if ALL the mail services were set with Domain Connect, they could be fully inclusive. To me, the ? and ~ variants are service providers being conservative and not wanting to break things.

My 2 cents.

Domain Connect is only used to set a small set of services that manipulate SPF records. Some of them prefer a -all, some prefer a ~all. And we've gone ahead and said "when using Domain Connect and SPFM, you don't get to specify this element". We pick one.

Right now we picked -all. Microsoft recommends this (-all). Google recommends ~all. While I'm not sure of their motivations for this recommendation, I suspect it has to do with being a bit conservative and not wanting to break other stuff.

Our r

knoepfchendruecker commented 4 years ago

Mailjet doesn't really recommend "?all" - their includable SPF record just includes an "?all" statement. However, only positive attributions from an included record are being honored, so it doesn't really matter if spf.mailjet.com does end in "?all", "~all", "-all" or doesn't list any default policy at all: the result is the same. They probably just included an "?all" to calm down requests of "your SPF record is missing a default policy".

A note about Microsoft: they've been recommending "-all" on SPF and SenderID for a very long time, and always argued the way the original SPF spec was intended to be used: to start at "?all", progress to "~all" and finally end at "-all". So probably they're just repeating themselves over and over again without re-evaluating if that argument still does make some sense or is still reasonable.

For example, Microsoft did chose to ignore the equivalent DMARC policy ("p=reject") for exactly the same reason many others are recommending ~all: potential issues when mails are being forwarded.

Quoting from https://docs.microsoft.com/en-us/office365/securitycompliance/use-dmarc-to-validate-email#how-office-365-handles-inbound-email-that-fails-dmarc:

If the DMARC policy of the sending server is p=reject, EOP marks the message as spam instead of rejecting it. In other words, for inbound email, Office 365 treats p=reject and p=quarantine the same way. Office 365 is configured like this because some legitimate email may fail DMARC. For example, a message might fail DMARC if it is sent to a mailing list that then relays the message to all list participants. If Office 365 rejected these messages, people could lose legitimate email and have no way to retrieve it. Instead, these messages will still fail DMARC but they will be marked as spam and not rejected. If desired, users can still get these messages in their inbox through these methods:

DMARC does evaluate both SPF and DKIM and accepts a message when either of them fails. So with DKIM in place, it's more forgiving than plain-old SPF using the "fail" policy. From that point of view, a more restrictive SPF policy doesn't make much sense.

https://docs.microsoft.com/en-us/microsoft-365/security/office-365-security/how-office-365-uses-spf-to-prevent-spoofing#form-your-spf-txt-record-for-office-365

is a little bit more verbose on a specific recommendation for the SPF record:

-all:[…] Also, if you are only using SPF, that is, you are not using DMARC or DKIM, you should use the -all qualifier. We recommend that you use always this qualifier. […] ~all:[…]If you're not sure that you have the complete list of IP addresses, then you should use the ~all (soft fail) qualifier. Also, if you are using DMARC with p=quarantine or p=reject, then you can use ~all. Otherwise, use -all. ?all […]This is used when testing SPF. We do not recommend that you use this qualifier in your live deployment.

So only in combination with enforced DMARC, Microsoft also "allows" the usage of "~all". In any way, Microsoft does not recommend using "?all" in a live environment.

Let's have a look at hotmail.com and outlook.com: DMARC not enforced (p=none), SPF is configured to "~all". Taking a look at yahoo.com: DMARC is enforced (p=reject), SPF is configured to "?all". Yahoo even makes use of the deprecated "ptr" mechanism. And yet another look at microsoft.com: DMARC is enforced (p=reject), SPF is configured to "-all". Sigh.

Recommendations from "official" sources:

The UK government in https://www.gov.uk/guidance/set-up-government-email-services-securely and https://www.ncsc.gov.uk/guidance/email-security-and-anti-spoofing recommends SPF to be using "~all" and DKIM to be deployed and DMARC to be iterated from "p=none" to "p=reject".

The factsheet from the Dutch Cyber Security Centrum at https://www.ncsc.nl/documenten/factsheets/2019/juni/01/factsheet-bescherm-domeinnamen-tegen-phishing recommends to use "~all" with SPF and use DKIM with DMARC to bypass any forwarding issues.

The US Department of Homeland Security in https://cyber.dhs.gov/bod/18-01/ requires government agencies to enforce DMARC, requires SPF . DKIM is not required and nowhere is a specific default policy for SPF defined. As the focus is on DMARC and DMARC overrides SPF's policy, this decision has probably been neglected as being less important than the DMARC policy. And just for example, dhs.gov, fbi.gov and nasa.gov do use "-all", while nps,gov, uspto.gov and whitehouse.gov do use "~all".

arnoldblinn commented 4 years ago

Like I said, we can switch our default generated policy to ~all. But the differences with all these settings appears to me to be due to the opinions and interpretation of specific providers when they are operating in a world where they only consider other services.

An SPF record (and potentially eventually a DMARC) that is 100% controlled through domain connect settings would be more "deterministic". But this is also a theoretical world that doesn't exist.

We can and should change the default policy to ~all. It is certainly more conservative.

knoepfchendruecker commented 4 years ago

Regarding "least restrictive merging": we could also remove an explicit default modifier, but ask for "least restrictive merging".

arnoldblinn commented 4 years ago

SPFM simply specifies the rules in between the v=spf and the *all commands. There is no such thing with SPFM as a least restrictive merging. We picked one (in our case -all, which you argue should be ~all).

So I'm not sure what you are asking for with "least restrictive merging". This term makes sense in the context of a manual merge by a human of multiple records, but given that the SPFM values in the Domain Connect templates don't specify any rules here.

I think you are confusing what Domain Connect and SPFM does relative to your mental model of merging.

knoepfchendruecker commented 4 years ago

Thanks for clarifying this, @arnoldblinn!

Based on the original spec, I did assume the following to be intended:

This would've permitted a trivial implementation of Microsoft's proposal:

As that's not the case - well, having "~all" as a default does solve the potential issue anyway.

arnoldblinn commented 4 years ago

Your third bullet (when a service sepecifies an "all" mechanism in its SPFM rules, the less rescritive...) was a false assumption.

There is no way to specify a "all" mechanism in the SPFM rules. We assumed it would always be -all.

So why are you asking that the behavior of the default global policy of the resulting merged record to be changed from -all to ~all?

I'm happy to if there is a good reason. But I don't want to do it based on a misunderstanding.

pawel-kow commented 4 years ago

My 2 cents: We decided that SPFM does not specify any "all" rule as it is shared between different providers and we picked "-all" as a hard-coded default taking into account that some providers may be too conservative and in the end spoil the effect of using SPF. Now it turns "-all" can be painful when Email gets forwarded, so "~all" would be the right balance between working solution and email security.

So why also language about "least restrictive"? Because we allow to modify "all" rule after SPFM operation, or SPFM may face a domain with already existing SPF, so it's smart to define the behavior in such cases. So if the customer himself changed the "all" policy to something less restrictive, we should not "upgrade" it again. Other approach we may take is to say the customer is always right so we do not change any existing "all" rule when merging. Opinions?

knoepfchendruecker commented 4 years ago

Thanks for catching up, @pawel-kow. It's exactly that concern regarding forwarded email and about sane defaults. Most users don't change their defaults, so their service provider needs to provide some sensible defaults.

On least restrictive

A "correct" approach is to start with a sane default configuration, educate the customer on the impact of the various options and use the customer's explicit configuration.

I'm not entirely convinced by merging with the current customer's SPF record, as the decision for that default policy may already be skewed or wrong. At least according to my personal experience on my customer's valid SPF records, most of those are simply the result of applying someone else's template (default templates provided by web host, email service provider, DNS editor,…), including their respective default policy.

SPF

Back in the early times of SPF, a "hard" failing SPF record was the the holy grail of SPF records:

Today's reality is a little bit different: Today, SPF records have become one out of multiple factors for ham/spam filters to decide on, and those services do usually rely on positively listed mail servers, but less on the exact "negativeness" for the inverse case. Those ham/spam filters also do factor in many more aspects of emails, so SPF is just "one" out of many issues.

The "negative" listings are still often honored during the SMTP dialogue by the respective MTA: before any other ham/spam filter could decide NOT to reject/discard that message. Accordingly, a "too hard" policy may prevent a ham/spam filter from accepting(!) a message for delivery.

As such, "-all" is recommended for domains who don't send any legitimate mail at all (e.g. parked domains). At worst, legitimate, forwarded messages won't be accepted by the receiving mail server and the sender possibly won't even receive a bounce message on this. "~all" is recommended for most environments. At worst, a legitimate, forwarded message is being more thoroughly scanned or may end up in the receiver's spam folder. Those strictly following RFC7208 should treat "?all" like no SPF record were present - which for today involves more intensive scanning for spam.

The "unimportance" of negative records

For example, Gmail "honors" the absence of a DKIM signature from a message or one's mail servers not being positively listed in one's SPF record by replacing the avatar image in webmail with a question mark stop sign and a hover text indicating the message could be sent by a spammer and not the actual user. Whatever negative default policy is used, doesn't really matter: it's the absence of positive attributions which counts.

As another example, SpamAssassin's has multiple default scores for different SPF results and exact configurations: a "SPF_PASS" benefits very little, while any other results do have large penalty scores. Depending on the overall configuration, the default scores for "SPF_SOFTFAIL" and even "SPF_NEUTRAL" can be even higher than those for "SPF_FAIL". However, all three "non-positive" are somehow close together, so it doesn't really matter which of them fires.

So then, we could simply set anything as a default policy? No. SPF has been the earliest of all protocols, it has also been implemented in MTAs and SPF's results may be evaluated right during the SMTP dialogue without any other (positive) attributions like DKIM being available. To avoid those cases where the SMTP dialogue might reject a message which then later could be accepted, I'd like to avoid the "fail" return code by default. This is also the reason why the DMARC folks argue not to use "-all" in SPF records: it might reject messages too early in the SMTP transaction who might otherwise pass DMARC's (or any other spam filter's) checks.

Hence my ask to replace "-all" by "~all".

knoepfchendruecker commented 4 years ago

Just to mention an article "worth reading":

https://hackernoon.com/myths-and-legends-of-spf-d17919a9e817 is a somehow recent summary on the current state of SPF, written by a mail.ru-engineer. Beside clearing myths and misconceptions, the post also has clearly explained recommendations and a "sidenote" from their customer support staff on the various nuances of SPF.

arnoldblinn commented 4 years ago

OK, I'm convinced.

Just to be clear though. I'm being picky because I hear commentary regarding merging with the least restrictive value. There isn't such a thing. The SPFM values contain the value of the SPF record in between the "v=spf" and the "*all" values.

So a template might contain a value for SPFM of include:xyz.com. Another template might contain include:abc.com.

Applying these templates to a zone without an SPF record would result in v=spf include:abc.com include:xyz.com ~all. The ~ is the change we are doing here.

Applying these templates to a zone WITH an SPF record isn't defined right now clearly. Say the zone already had v=spf include:example.com -all.

Does it delete the existing SPF record? Or does it merge the templates in? And if it merges the templates in, what is the modifier?

My implementation merges in and leaves the existing modifier in place. So the result of applying these two templates would be:

v=spf include:abc.com include xyz.com include:example.com -all

pawel-kow commented 4 years ago

@arnoldblinn

Applying these templates to a zone WITH an SPF record isn't defined right now clearly.

This is a part of the exercise, therefore the proposed change in the text taking into account the existing SPF in #47:

When a template is added or removed with an _SPFM_ record in the template, 
some code would need to take the aggregate value of all _SPFM_ records 
in all templates applied as well as existing SPF TXT record on the host 
and recalculate the resulting SPF TXT record. In case several sources specify the 
same rule with a different policy DNS Provider SHOULD apply the least restrictive 
one as a result. _soft failure_ SHOULD be preferred over _hard failure_, _neutral_
SHOULD be preferred over _soft failure_.

My implementation merges in and leaves the existing modifier in place.

This is the alternative I also proposed in the previous comment and in the end better from the user perspective following the changes he applied.

Proposed change to the previous text:

When a template is added or removed with an _SPFM_ record in the template,
some code would need to take the aggregate value of all _SPFM_ records
in all templates applied as well as existing SPF TXT record on the host 
and recalculate the resulting SPF TXT record. In case the existing SPF TXT record 
already specifies the "all" rule, it's modifier SHOULD remain intact after the merge
operation.

Are we all ok with the approach and the text so #47 can be finalized? @arnoldblinn @knoepfchendruecker

knoepfchendruecker commented 4 years ago

RFC7208 defines "all" as a "mechanism" and not as a "rule".

So: "In case the existing SPF TXT record already uses the "all" mechanism, its modifier SHOULD remain intact after the merge operation."

What about cases where the existing SPF TXT record uses the "redirect" modifier?

According to RFC7208 5.1, an "all" mechanism will ask everyone to ignore the "redirect". So while the SPF record is still syntactically correct, it's certainly not what the user did expect.

So:

pawel-kow commented 4 years ago

IMHO "redirect" is a Pro use-case and a bit away from the target group of Domain Connect. From this perspective I am OK to leave it as undefined that each DNS provider will decide to solve different way (any of the solutions that you mentioned may be equally valid).