Concerns regarding "-all" as a default policy on SPF

knoepfchendruecker commented 5 years ago

Default SPF policy of "fail"

6.10.3 advises always to use a fixed modifier of "-" (fail) for any messages from other sources not specified in SPF rules. When two services with different policies are being merged via SPFM, they do result in having "-all" as a default policy.

6.10.3 also states "it" (merged SPF record or just the default policy?) can always be modified by the user after the merge operation is completed.

After years of experience with SPF, this advisory is quite worrying, as it puts a troublesome default into place.

Issues with forwarding (SPF, SRS, DMARC)

Mails may be forwarded by generic mail forwarding services and mailing lists, who don't rewrite rfc5321.MailFrom. As the sender's SPF record doesn't list those forwarding hosts, the final receiving host will apply the default policy: to reject those forwarded messages.

The experimental SRS (Sender Rewriting Scheme) is a method to rewrite the address in rfc5321.MailFrom. An SRS-compliant forwarding host will encode the original address in the localpart and append a SRS-specific, SPF-whitelisted domain to it. From the receiving host's point of view, the sender address meets the SPF record of the (SRS-specific) domain and so the message is to be accepted.

In real life, SRS has a very low adoption rate due to complexity and further issues.

SRS is also incompatible with DMARC's requirement on the identifier alignment: in DMARC's "strict" mode, both domains from rfc5321.MailFrom and rfc5322.From need to match exactly; in DMARC's relaxed mode, the rfc5321.MailFrom domain must be at least a subdomain of the domain from rfc5322.From.

Many mail servers do evaluate SPF records during the SMTP dialogue and reject after "MAIL FROM" commands, while DMARC is evaluated after the message has been received later during the SMTP dialogue. A "fail" default policy can result in situations where a forwarded message won't have a chance of being accepted by the receiving mail server:

If rfc5321.MailFrom has been rewritten by SRS, it'll pass the initial SPF check during the SMTP dialogue but won't meet the DMARC requirement of identifier alignment.
If rfc5321.MailFrom has not been rewritten, it won't pass the initial SPF check during the SMTP dialogue, even though it could've met sufficient requirements from DMARC (by having correct identifier alignment and a valid DKIM signature).

Using a less restrictive default policy like softfail (~all) or neutral (?all), the un-rewritten message could've passed the initial SPF check.

One might also argue to prefer "neutral": RFC7208 specifies the lack of an "all" mechanism to be interpreted as "neutral":

If none of the mechanisms match and there is no "redirect" modifier, then the check_host() returns a result of "neutral", just as if "?all" were specified as the last directive.

Contradicting example

6.10.2 gives an example where multiple SPF records with different default policies (~all and -all) are manually being merged. The result uses the "least restrictive all modifier" as a new default policy of the SPF record and advocates "-all" to be more appropriate when no other services are being used.

This merge strategy makes much more sense to me, as it does prefer the "more compatible" default policy rather than the "most strict" policy.

Other antispam engines

From perspective of different spam filtering engines like SpamAssassin, there's not much difference between a softfail and a fail default policy, yet there is often a strong difference if SPF is being evaluated by an MTA. So after all, a softfail (~all) default policy seems to be a much more reasonable default for most users, as it does avoid forwarded messages from being rejected at "MAIL FROM" time without risking compliance on other standards like DMARC. When SPF and DMARC are being evaluated at the same time, an enforced DMARC policy (p=quarantine, p=reject) overrides any SPF default policies, so having a strict SPF-encoded default policy is even less required.

Suggestions

I do see a few points to address this topic.

have a default policy of "~all" to address potential problems with forwarded mails and DMARC.
when multiple records with different default policies are being merged, the least-restrictive modifier aside of "pass" is to be preferred: neutral is to be preferred over softfail, softfail is to be preferred over fail: ?all > ~all > -all
Due to the negative consequences when mail is being forwarded, a policy of fail (-fail) is only to be used when the user is aware of its effect. Accordingly, it MUST NOT be a silent default option by SPFM merges, unless the user did actively select this option.
When DMARC is in place using a policy of p=quarantine or p=reject, do at most use an SPF default policy of ~all (softfail). Together with "least-restrictive merging", a DMARC-template could simply apply an empty SPFM record or SPFM-merge "~all" into what else is left.

Also noteworthy: RFC7208 in appendix A.4 contains an example making use of "+all" (pass) for a restrictive policy (by negating other records, including a deprecated "ptr" method). I do have serious doubt such a record could be successfully merged with any "more common" SPF record. I haven't seen such an SPF record in real life, but I've seen quite a few "+all" or "all" records, which result in an insecure configuration.

It's probably reasonable to reject merging any SPFM record mentioning the "all" mechanism with either an explicitly or implicitly passing modifier ("+all,"all").

pawel-kow commented 5 years ago

So long story short you argument that:

default policy for SPF of Soft-Fail "~all" shall be safer compared to currently specified Hard-Fail "-all" to avoid issues when the E-mail gets redirected by the Email recipient (so nothing the sender or the one configuring a domain may influence)
SPFM merging with an existing SPF record shall rather take least-restrictive approach for default policy, rather than hard-coded Hard-Fail as it is specified now. In this point possibly we need some explicit language about this case as right now it is not addressed in a clear way in the spec.

knoepfchendruecker commented 5 years ago

@pawel-kow Exactly. Sorry for providing a too much elaborated description.

arnoldblinn commented 5 years ago

@pawel-kow This was my conclusion too. The thoughts seemed reasonable. @knoepfchendruecker Yes, it was elaborate :-).

pawel-kow commented 5 years ago

OK, I proposed the change in https://github.com/Domain-Connect/spec/pull/47 @arnoldblinn @knoepfchendruecker please review

arnoldblinn commented 5 years ago

I was travelling last week, and didn't have a chance to dig into this.

The upshot of this discussion is to change the "all" modifer on a merged spf record from a - to a ~. I don't have a problem with this per-se. However, digging through the long conversation....

Section 6.10.2 is a description of how people might manually merge records. Here we say use the least restrictive approach. I think everyone agrees this makes sense when manually dealing with SPF records.

But section 6.10.3 is a description of how the SPFM record should be merged into a final SPF record. The SPFM does NOT contain a rule for the "all". The whole point of this was to eliminate this complexity for the services and for the user.

We originall picked -all. The thought process when originally writing this being that any provider using Domain Connect for this functionality would have an all inclusive rule.

We can change this to ~all without much difficultly. But bear in mind this is not taking a "least restrictive on merging records" approach. This is hard coding a default modifier for all. The SPFM record does NOT specify the desired modifier for the all rule.

I looked at the templates that use the SPFM record and searched online for their recommended "manual" settings:

Microsoft recommends a -all
zoho mail recommends a -all
mailjet recommends a ?all
plesk recommends a ~all

These guys are all over the map; and I suspect it is the conservative nature of mailjet and plesk.

Note that Google isn't using the spfm record in the templates, and they use a -all. Although searching has some help articles that recommend a ~all. Again, inconsistent.

I'm happy to pick a default rule of ~all. But the rationale behind the -all was if ALL the mail services were set with Domain Connect, they could be fully inclusive. To me, the ? and ~ variants are service providers being conservative and not wanting to break things.

My 2 cents.

Domain Connect is only used to set a small set of services that manipulate SPF records. Some of them prefer a -all, some prefer a ~all. And we've gone ahead and said "when using Domain Connect and SPFM, you don't get to specify this element". We pick one.

Right now we picked -all. Microsoft recommends this (-all). Google recommends ~all. While I'm not sure of their motivations for this recommendation, I suspect it has to do with being a bit conservative and not wanting to break other stuff.

Our r

knoepfchendruecker commented 5 years ago

Mailjet doesn't really recommend "?all" - their includable SPF record just includes an "?all" statement. However, only positive attributions from an included record are being honored, so it doesn't really matter if spf.mailjet.com does end in "?all", "~all", "-all" or doesn't list any default policy at all: the result is the same. They probably just included an "?all" to calm down requests of "your SPF record is missing a default policy".

A note about Microsoft: they've been recommending "-all" on SPF and SenderID for a very long time, and always argued the way the original SPF spec was intended to be used: to start at "?all", progress to "~all" and finally end at "-all". So probably they're just repeating themselves over and over again without re-evaluating if that argument still does make some sense or is still reasonable.

For example, Microsoft did chose to ignore the equivalent DMARC policy ("p=reject") for exactly the same reason many others are recommending ~all: potential issues when mails are being forwarded.

Quoting from https://docs.microsoft.com/en-us/office365/securitycompliance/use-dmarc-to-validate-email#how-office-365-handles-inbound-email-that-fails-dmarc:

If the DMARC policy of the sending server is p=reject, EOP marks the message as spam instead of rejecting it. In other words, for inbound email, Office 365 treats p=reject and p=quarantine the same way. Office 365 is configured like this because some legitimate email may fail DMARC. For example, a message might fail DMARC if it is sent to a mailing list that then relays the message to all list participants. If Office 365 rejected these messages, people could lose legitimate email and have no way to retrieve it. Instead, these messages will still fail DMARC but they will be marked as spam and not rejected. If desired, users can still get these messages in their inbox through these methods:

DMARC does evaluate both SPF and DKIM and accepts a message when either of them fails. So with DKIM in place, it's more forgiving than plain-old SPF using the "fail" policy. From that point of view, a more restrictive SPF policy doesn't make much sense.

https://docs.microsoft.com/en-us/microsoft-365/security/office-365-security/how-office-365-uses-spf-to-prevent-spoofing#form-your-spf-txt-record-for-office-365

is a little bit more verbose on a specific recommendation for the SPF record:

-all:[…] Also, if you are only using SPF, that is, you are not using DMARC or DKIM, you should use the -all qualifier. We recommend that you use always this qualifier. […] ~all:[…]If you're not sure that you have the complete list of IP addresses, then you should use the ~all (soft fail) qualifier. Also, if you are using DMARC with p=quarantine or p=reject, then you can use ~all. Otherwise, use -all. ?all […]This is used when testing SPF. We do not recommend that you use this qualifier in your live deployment.

So only in combination with enforced DMARC, Microsoft also "allows" the usage of "~all". In any way, Microsoft does not recommend using "?all" in a live environment.

Let's have a look at hotmail.com and outlook.com: DMARC not enforced (p=none), SPF is configured to "~all". Taking a look at yahoo.com: DMARC is enforced (p=reject), SPF is configured to "?all". Yahoo even makes use of the deprecated "ptr" mechanism. And yet another look at microsoft.com: DMARC is enforced (p=reject), SPF is configured to "-all". Sigh.

Recommendations from "official" sources:

The UK government in https://www.gov.uk/guidance/set-up-government-email-services-securely and https://www.ncsc.gov.uk/guidance/email-security-and-anti-spoofing recommends SPF to be using "~all" and DKIM to be deployed and DMARC to be iterated from "p=none" to "p=reject".

The factsheet from the Dutch Cyber Security Centrum at https://www.ncsc.nl/documenten/factsheets/2019/juni/01/factsheet-bescherm-domeinnamen-tegen-phishing recommends to use "~all" with SPF and use DKIM with DMARC to bypass any forwarding issues.

The US Department of Homeland Security in https://cyber.dhs.gov/bod/18-01/ requires government agencies to enforce DMARC, requires SPF . DKIM is not required and nowhere is a specific default policy for SPF defined. As the focus is on DMARC and DMARC overrides SPF's policy, this decision has probably been neglected as being less important than the DMARC policy. And just for example, dhs.gov, fbi.gov and nasa.gov do use "-all", while nps,gov, uspto.gov and whitehouse.gov do use "~all".

arnoldblinn commented 5 years ago

Like I said, we can switch our default generated policy to ~all. But the differences with all these settings appears to me to be due to the opinions and interpretation of specific providers when they are operating in a world where they only consider other services.

An SPF record (and potentially eventually a DMARC) that is 100% controlled through domain connect settings would be more "deterministic". But this is also a theoretical world that doesn't exist.

We can and should change the default policy to ~all. It is certainly more conservative.

knoepfchendruecker commented 5 years ago

Regarding "least restrictive merging": we could also remove an explicit default modifier, but ask for "least restrictive merging".

If all SPFM-contributing services does set "-all", then we'll end up on "-all".
Gotcha: when one service doesn't specify an "all" mechanism at all and we're treating this like an implicit "?all" (as specified in RFC7208), the "least restrictive merging" strategy will end up at "?all".

arnoldblinn commented 5 years ago

SPFM simply specifies the rules in between the v=spf and the *all commands. There is no such thing with SPFM as a least restrictive merging. We picked one (in our case -all, which you argue should be ~all).

So I'm not sure what you are asking for with "least restrictive merging". This term makes sense in the context of a manual merge by a human of multiple records, but given that the SPFM values in the Domain Connect templates don't specify any rules here.

I think you are confusing what Domain Connect and SPFM does relative to your mental model of merging.

knoepfchendruecker commented 5 years ago

Thanks for clarifying this, @arnoldblinn!

Based on the original spec, I did assume the following to be intended:

default policy of all SPFM rules is implicitely "-all"
when two services specify their specific "permissive" rules in SPFM, those rules are getting merged.
when a service specifies an "all" mechanism in its SPFM rules, the less restrictive "all" mechanism of default (-all) and any specified "all" mechanisms is being used. -- As any statements beyond the first occurence of "all" are being ignored, such a special mechanism for "all-merging" does make some sense.

This would've permitted a trivial implementation of Microsoft's proposal:

use "-all" by default - that's part of the spec.
services do introduce their own SPF rules, without the trailing "all" mechanism. Their rules are being merged.
when a service introduces an enforced DMARC record, that service does want to change the default policy for SPF records to a value no higher than "~all". If "least restrictive merging" is in place, this could be done by introducing an SPFM record with just "~all" as the only rule. This merges with any other SPFM rules, and being the "least restrictive policy", changes the resulting TXT record to include any already existing SPF rules, but also change the default policy from "-all" to "~all".

As that's not the case - well, having "~all" as a default does solve the potential issue anyway.

arnoldblinn commented 5 years ago

Your third bullet (when a service sepecifies an "all" mechanism in its SPFM rules, the less rescritive...) was a false assumption.

There is no way to specify a "all" mechanism in the SPFM rules. We assumed it would always be -all.

So why are you asking that the behavior of the default global policy of the resulting merged record to be changed from -all to ~all?

I'm happy to if there is a good reason. But I don't want to do it based on a misunderstanding.

pawel-kow commented 5 years ago

My 2 cents: We decided that SPFM does not specify any "all" rule as it is shared between different providers and we picked "-all" as a hard-coded default taking into account that some providers may be too conservative and in the end spoil the effect of using SPF. Now it turns "-all" can be painful when Email gets forwarded, so "~all" would be the right balance between working solution and email security.

So why also language about "least restrictive"? Because we allow to modify "all" rule after SPFM operation, or SPFM may face a domain with already existing SPF, so it's smart to define the behavior in such cases. So if the customer himself changed the "all" policy to something less restrictive, we should not "upgrade" it again. Other approach we may take is to say the customer is always right so we do not change any existing "all" rule when merging. Opinions?

knoepfchendruecker commented 5 years ago

Thanks for catching up, @pawel-kow. It's exactly that concern regarding forwarded email and about sane defaults. Most users don't change their defaults, so their service provider needs to provide some sensible defaults.

On least restrictive

A "correct" approach is to start with a sane default configuration, educate the customer on the impact of the various options and use the customer's explicit configuration.

I'm not entirely convinced by merging with the current customer's SPF record, as the decision for that default policy may already be skewed or wrong. At least according to my personal experience on my customer's valid SPF records, most of those are simply the result of applying someone else's template (default templates provided by web host, email service provider, DNS editor,…), including their respective default policy.

SPF

Back in the early times of SPF, a "hard" failing SPF record was the the holy grail of SPF records:

-all was the ultimate goal, said to be "most secure"
~all was an "intermediate" step, for those who aren't yet entirely sure.
?all was "for testing purposes only" Microsoft's documentation on SPF does still read the same way.

Today's reality is a little bit different: Today, SPF records have become one out of multiple factors for ham/spam filters to decide on, and those services do usually rely on positively listed mail servers, but less on the exact "negativeness" for the inverse case. Those ham/spam filters also do factor in many more aspects of emails, so SPF is just "one" out of many issues.

The "negative" listings are still often honored during the SMTP dialogue by the respective MTA: before any other ham/spam filter could decide NOT to reject/discard that message. Accordingly, a "too hard" policy may prevent a ham/spam filter from accepting(!) a message for delivery.

As such, "-all" is recommended for domains who don't send any legitimate mail at all (e.g. parked domains). At worst, legitimate, forwarded messages won't be accepted by the receiving mail server and the sender possibly won't even receive a bounce message on this. "~all" is recommended for most environments. At worst, a legitimate, forwarded message is being more thoroughly scanned or may end up in the receiver's spam folder. Those strictly following RFC7208 should treat "?all" like no SPF record were present - which for today involves more intensive scanning for spam.

the presence of correct "positive" attributions is more important. But: "+all" is insecure, may result in a poorer rating and so should be avoided.
when DMARC has been configured, is being enforced and used by the receiver's mail server, DMARC's policy overrides any default decision on SPF - ?all, ~all and -all end up being the same.

The "unimportance" of negative records

For example, Gmail "honors" the absence of a DKIM signature from a message or one's mail servers not being positively listed in one's SPF record by replacing the avatar image in webmail with a question mark stop sign and a hover text indicating the message could be sent by a spammer and not the actual user. Whatever negative default policy is used, doesn't really matter: it's the absence of positive attributions which counts.

As another example, SpamAssassin's has multiple default scores for different SPF results and exact configurations: a "SPF_PASS" benefits very little, while any other results do have large penalty scores. Depending on the overall configuration, the default scores for "SPF_SOFTFAIL" and even "SPF_NEUTRAL" can be even higher than those for "SPF_FAIL". However, all three "non-positive" are somehow close together, so it doesn't really matter which of them fires.

So then, we could simply set anything as a default policy? No. SPF has been the earliest of all protocols, it has also been implemented in MTAs and SPF's results may be evaluated right during the SMTP dialogue without any other (positive) attributions like DKIM being available. To avoid those cases where the SMTP dialogue might reject a message which then later could be accepted, I'd like to avoid the "fail" return code by default. This is also the reason why the DMARC folks argue not to use "-all" in SPF records: it might reject messages too early in the SMTP transaction who might otherwise pass DMARC's (or any other spam filter's) checks.

Hence my ask to replace "-all" by "~all".

knoepfchendruecker commented 5 years ago

Just to mention an article "worth reading":

https://hackernoon.com/myths-and-legends-of-spf-d17919a9e817 is a somehow recent summary on the current state of SPF, written by a mail.ru-engineer. Beside clearing myths and misconceptions, the post also has clearly explained recommendations and a "sidenote" from their customer support staff on the various nuances of SPF.

arnoldblinn commented 5 years ago

OK, I'm convinced.

Just to be clear though. I'm being picky because I hear commentary regarding merging with the least restrictive value. There isn't such a thing. The SPFM values contain the value of the SPF record in between the "v=spf" and the "*all" values.

So a template might contain a value for SPFM of include:xyz.com. Another template might contain include:abc.com.

Applying these templates to a zone without an SPF record would result in v=spf include:abc.com include:xyz.com ~all. The ~ is the change we are doing here.

Applying these templates to a zone WITH an SPF record isn't defined right now clearly. Say the zone already had v=spf include:example.com -all.

Does it delete the existing SPF record? Or does it merge the templates in? And if it merges the templates in, what is the modifier?

My implementation merges in and leaves the existing modifier in place. So the result of applying these two templates would be:

v=spf include:abc.com include xyz.com include:example.com -all

pawel-kow commented 5 years ago

@arnoldblinn

Applying these templates to a zone WITH an SPF record isn't defined right now clearly.

This is a part of the exercise, therefore the proposed change in the text taking into account the existing SPF in #47:

When a template is added or removed with an _SPFM_ record in the template, 
some code would need to take the aggregate value of all _SPFM_ records 
in all templates applied as well as existing SPF TXT record on the host 
and recalculate the resulting SPF TXT record. In case several sources specify the 
same rule with a different policy DNS Provider SHOULD apply the least restrictive 
one as a result. _soft failure_ SHOULD be preferred over _hard failure_, _neutral_
SHOULD be preferred over _soft failure_.

My implementation merges in and leaves the existing modifier in place.

This is the alternative I also proposed in the previous comment and in the end better from the user perspective following the changes he applied.

Proposed change to the previous text:

When a template is added or removed with an _SPFM_ record in the template,
some code would need to take the aggregate value of all _SPFM_ records
in all templates applied as well as existing SPF TXT record on the host 
and recalculate the resulting SPF TXT record. In case the existing SPF TXT record 
already specifies the "all" rule, it's modifier SHOULD remain intact after the merge
operation.

Are we all ok with the approach and the text so #47 can be finalized? @arnoldblinn @knoepfchendruecker

knoepfchendruecker commented 5 years ago

RFC7208 defines "all" as a "mechanism" and not as a "rule".

So: "In case the existing SPF TXT record already uses the "all" mechanism, its modifier SHOULD remain intact after the merge operation."

What about cases where the existing SPF TXT record uses the "redirect" modifier?

According to RFC7208 5.1, an "all" mechanism will ask everyone to ignore the "redirect". So while the SPF record is still syntactically correct, it's certainly not what the user did expect.

"redirect" can be combined with other terms, but is evaluated after any mechanisms (a, ip4, ip6, mx, include, all,…). That's why "all" will functionally disable a "redirect".
at first, one may be tempted to simply "include:" the redirect - yet there are a few differences and caveats who can dramatically differ.
Using "redirect" multiple times may also have an unexpected result (the first occurence will terminate parsing), so probably we shouldn't support it as a term in an SPFM record.

So:

Handle this as a conflict in UX?
Replace the SPFM-terminating "all" mechanism by the existing redirect term?
Replace "redirect=" by "include:", when it doesn't make use of "a", "mx" or "ptr"?
Possibly: state a list of "supported" SPF terms for SPFM records.

pawel-kow commented 5 years ago

IMHO "redirect" is a Pro use-case and a bit away from the target group of Domain Connect. From this perspective I am OK to leave it as undefined that each DNS provider will decide to solve different way (any of the solutions that you mentioned may be equally valid).

Domain-Connect / spec

Concerns regarding "-all" as a default policy on SPF #46