[Bug]: Weird behavior for SIWE requests on localhost

bschorchit commented 1 year ago

Describe the bug

This was originally reported to us here on Slack and here https://github.com/MetaMask/metamask-extension/issues/17707#issuecomment-1470728017.

This is not an issue if you don't append http:// to localhost in the URI in the message (see recording below).

Steps to reproduce

Here is a example of message to sign that was reported to cause this while on localhost : https://github.com/MetaMask/metamask-extension/issues/17707#issuecomment-1470786260

Error messages or log output

No response

Version

10.26.1

Build type

None

Browser

Other (please elaborate in the "Additional Context" section)

Operating system

Other (please elaborate in the "Additional Context" section)

Hardware wallet

No response

Additional context

Not sure on browser and OS info.

Recording: https://user-images.githubusercontent.com/34968356/225613863-783e9066-d1a4-4626-b5be-c358e31818ab.mp4

bschorchit commented 1 year ago

Update from @seaona : after trying several times, one time the popup didn't open - no error neither. I could also see SIWE screen too in other trials and another thing I see is that on the URI it's set https://localhost:9011/ but I am using http.

image (8)

yonigoldberg commented 1 year ago

@bschorchit is there an ETA on this issue?

yonigoldberg commented 1 year ago

@bschorchit - I opened a pr for a fix

bschorchit commented 1 year ago

Thanks for that, @yonigoldberg ! We rolled out a hotfix (v10.26.2) yesterday. Is the current behavior on 10.26.2 on localhost still an issue for you?

yonigoldberg commented 1 year ago

It is a different issue now. Now it requires to approve localhost as a trusted domain

yonigoldberg commented 1 year ago

@digiwand @jpuri @danjm - This is related to your fix in: https://github.com/MetaMask/metamask-extension/pull/18200 Do you mind reviewing https://github.com/MetaMask/metamask-extension/pull/18296?

yonigoldberg commented 1 year ago

Thanks @digiwand, I added the test that you mentioned here (I can't comment on the PR due to lack of permissions).

digiwand commented 1 year ago

Thanks @yonigoldberg! LGTM! I'm adding this to our internal list of PR reviews. will try to get this in the next release

I didn't realize you can't comment in the PR. To confirm, you could comment on the main thread, but not an individual comment?

yonigoldberg commented 1 year ago

Thanks @digiwand - yheaa I can't comment there.

yonigoldberg commented 1 year ago

[Sadly I can not comment on my PR] https://github.com/MetaMask/metamask-extension/pull/18296#discussion_r1150684088 @naugtur - To note that per the spec of SIWE, port is optional. How would you approach this? https://eips.ethereum.org/EIPS/eip-4361#abnf-message-format

domain = authority
    ; From RFC 3986:
    ;     authority     = [ userinfo "@" ] host [ ":" port ]
    ; See RFC 3986 for the fully contextualized
    ; definition of "authority".

naugtur commented 1 year ago

@yonigoldberg I don't have full context so I'm making assumptions from my webapp security background. This might be wrong. That said, I think port being optional might mean it defaults to 80?

I'll take a closer look. Or maybe someone more familiar with the Origin guarantees in SIWE can chime in.
My participation is just the kneejerk reflex when I see protocol/port being omited on something called origin.

yonigoldberg commented 1 year ago

Fair point.

digiwand commented 1 year ago

@yonigoldberg the PR is unblocked now and is available to comment now. I'm up for merging in @naugtur's suggestion if that sounds good with you too

digiwand commented 1 year ago

edit: see below comments ~we can loosen restrictions later if we see a use-case for ports on remote origins. For now, unblocking ports on localhost seems useful and matches the issue case. Should remote origins with mismatched ports present themselves, they will not be blocked and only be presented with a warning.~

curious to learn more about the port use cases here and why this may or may not be a concern outside of localhost.

legobeat commented 1 year ago

Summarizing relevant parts of EIP-4361:

domain is the RFC 3986 authority that is requesting the signing. uri is an RFC 3986 URI referring to the resource that is the subject of the signing (as in the subject of a claim).

; ABNF
domain = authority
    ; From RFC 3986:
    ;     authority     = [ userinfo "@" ] host [ ":" port ]
    ; See RFC 3986 for the fully contextualized
    ; definition of "authority".

uri = URI
    ; See RFC 3986 for the definition of "URI".

(as noted above, port is optional)

Creating sessions

Sessions MUST be bound to the address and not to further resolved resources that can change.
Verifying domain binding

Wallet implementers MUST prevent phishing attacks by matching on the domain term when processing a signing request. For example, when processing the message beginning with "service.invalid wants you to sign in...", the wallet checks that the request actually originated from service.invalid.

The domain SHOULD be read from a trusted data source such as the browser window or over WalletConnect (ERC-1328) sessions for comparison against the signing message contents.
Out of Scope

The following concerns are out of scope for this version of the specification to define: [...]

Authorization to server resources.

Interpretation of the URIs in the resources term as claims or other resources.

The specific mechanisms to ensure domain-binding.

Security Considerations: Wallet and relying party combined security

Both the wallet and relying party have to implement this specification for improved security to the end user. Specifically, the wallet MUST confirm that the message is for the correct domain or provide the user means to do so manually (such as instructions to visually confirming the correct domain in a TLS-protected website prior to connecting via QR code or deeplink), otherwise the user is subject to phishing attacks.

Verification of domain binding

Wallets MUST check that the domain matches the actual signing request source.

This value SHOULD be checked against a trusted data source such as the browser window or over another protocol.

Comment

My interpretation of the above:

There is no requirement on relationship between uri, domain, and resources. So only domain is relevant for this discussion.
If port and userinfo are omitted from domain, it seems reasonable that these are "wildcarded" and omitted from the domain binding check. So if domain: localhost, then any port on localhost would match that.
- (:warning: This does seem like a potentially problematic vector, as per @naugtur observation on shared hosting above and this being subtle/non-obvious - could consider proposing extending/amending the EIP here but ultimately out of scope here)
If either port or userinfo is specified on domain in the message, then they must be enforced, regardless of the domain in question
- If there are scenarios where it's desirable to sidestep this behavior, it must involve a fully-informed-and-active approval on behalf of the user.

So for #18296, my takeaway would be:

there should be no special behavior for localhost
always perform port and userinfo validation iff respective part is present in domain field of SIWE message. If not, only check the domain parts supplied in the domain field.

This allows the authority to decide "how wide their identity" is. Local dev/testing dapps can specify localhost and accept everything from the localhost origin, while a trusted self-hosted service accessed via localhost would require this protection to be kept, and not restricting a specified port here would be unexpected behavior.

legobeat commented 1 year ago

Sessions MUST be bound to the address and not to further resolved resources that can change. tells us that port changes should not invalidate the session.

But other than that, being stricter and having checks for domain validation that are not part of the spec should also be non-violating. So in general it should be OK for MetaMask to add its own restrictions around "domain binding" (which may make special cases for localhost uri/domain/resourced), as long as those are documented clearly.

naugtur commented 1 year ago

Facts

ERC-4361, by referencing authority from RFC-3986 is implicitly dragging the concept of URI scheme along with it as authority is defined within the scope of a URI scheme.

For the purpose of RFC-3986 authority used in domain of ERC-4361, how is the URI scheme defined?
It is the URI scheme that defines the default port value, so it's necessary for the interpretation of the domain that doesn't contain the :port.
eg. If we assume http as the URI scheme, lack of :port can be interpreted to mean :80 or empty. the RFC-3986 itself is not clear about it and we'd have to reach for the http spec, where (I assume from prior experience) the interpretation ends up being :80 not empty.

ERC-4361 would have to define domain as RFC-3986 authority with a URI scheme that doesn't specify a default port to allow ignoring the port under all circumstances.

I think the lack of a field defining the URI scheme in ERC-4361 spec may be considered as it being implicitly declared void. The domain is coming from a http or https scheme though.

also, ERC-4361 "Out of scope" section says:

The specific mechanisms to ensure domain-binding.

In the current shape of ERC-4361 the burden of defining what URI scheme to consider for checking that the domain matches the actual signing request source falls on said specific mechanisms.

Unless ERC-4361 is updated to explicitly define URI scheme for authority used as domain, it's on us to decide how to interpret the missing port.

Our implementation has the following obligations as per ERC-4361:

Obligation 1:

the wallet MUST confirm that the message is for the correct domain or provide the user means to do so manually (such as instructions to visually confirming the correct domain in a TLS-protected website prior to connecting via QR code or deeplink), otherwise the user is subject to phishing attacks.

Obligation 2:

Wallets MUST check that the domain matches the actual signing request source. This value SHOULD be checked against a trusted data source such as the browser window or over another protocol.

Interpretation

Examples

What should we do in these situations?

Domain matching

Situation 1A: A page hosted at https://evil.com requests SIWE signature with domain set to example.com.

Situation 1B: A page hosted at https://3eda758fc11c.cloudfront.net requests SIWE signature with domain set to dd5693b54fe3.cloudfront.net.

Situation 1C: A page hosted at https://example.com requests SIWE signature with domain set to example.com or example.com:443.

Localhost use-cases

Situation 2A: A local privileged service is operating on http://localhost:80 which requires root privileges to bind to. The user runs npm install in a project and one of the dependencies uses a malicious install script to start a server on http://localhost:1337 and later confuses/phishes the user to go to that address. It requests SIWE signature with domain set to localhost as usual.

Situation 2B: A builder wants to test their application and the test setup runs it on different ports on localhost each time and it's inconvenient to add code to detect that before creating the domain for the SIWE request, so domain is set to localhost

Situation 2C: A local privileged service is operating on http://localhost:80 which requires root privileges to bind to. It requests SIWE signature with domain set to localhost.

Shared hosting

Context: A hosting provider offers public IP4 and free domains, but each user on the shared hosting gets just 5 random ports.

Situation 3A: A phishing page posing as usercontent.hosting.com:1973 hosted at http://usercontent.hosting.com:1337 requests SIWE signature with domain set to usercontent.hosting.com.

Situation 3B: A phishing page posing as usercontent.hosting.com:1973 hosted at http://usercontent.hosting.com:1337 requests SIWE signature with domain set to usercontent.hosting.com:1973.

Situation 3A: A phishing page posing as usercontent.hosting.com:1973 hosted at http://usercontent.hosting.com:1337 requests SIWE signature with domain set to usercontent.hosting.com:1337. The user doesn't remember the original port number anyway.

Discussion

Obligation 2 says we must check the domain against a trusted data source - in our case that's the Origin of the browser Realm (often the current tab). It doesn't say how strict that check needs to be before we defer to the user as per Obligation 1.

Obligation 1 suggests there's a point at which we sould no longer try to block the request and let the user make the decision while providing enough context for them to understand their choice.

Note that we cannot use the presence of the port in domain as the only indicator whether the port needs to be verified, because domain is defined by the malicious actor in an attack. (2A, 3A)

To address situations 2A and 3A without relying on users we would need to implement strict checking of the port and if domain omits the port number, assume default for the protocol that we know from the Origin.

Depending on how much we rely on the user's decision, these are possible behaviors to implement:

❌ - mismatch error, API returns an error, the user never get exposed to that UI. ⚠️ - show a prominent warning message + the second step to confirm. 👍 - match, ask user - regular SIWE screen. (updated according to Antonela)

Port checking behaviors in situations above:

	1. strict check	2. optional check	3. strict unless localhost	4. warn only
1A	❌	❌	❌	❌
1B	❌	❌	❌	⚠️😱
1C	👍	👍	👍	👍
2A	❌	👍	👍	⚠️
2B	❌	👍	👍	⚠️
2C	👍	👍	👍	👍
3A	❌	👍	❌	⚠️
3B	❌	❌	❌	⚠️
3C	👍	👍	👍	👍

Conclusions

This section includes author's opinions.

We need to decide between implementing strict checking with all of the friction it introduces or a less strict behavior that's more desirable while requiring user to decide and UI to help them make the right decision.

Rejecting 1A and 1B is a feature and should not need further discussion.
Making port check optional goes against same-origin policy and some builders/companies may assume they are safe against attacks from other ports on the same domain. They would need to know they must specify the port in their SIWE requests and teach their users to verify it's there and didn't change.
Unless the port number is the default for the protocol of Origin, the port should not be omited when displaying the Origin in the UI of MetaMask.
In situations where we don't match port strictly, the port of Origin should be emphasized in the UI when a port is not specified in domain
Nothing we do can help if the user doesn't remember the original port of the app they intended to use in situations like 3C, so strict port checking introduces friction for builders and false sense of security for users who don't remember port numbers.

Ensuring that UI is helping the user understand port mismatch to make up for the security tradeoff and making port checking optional with behavior 2 (optional check) seems the most promising outcome for this issue.

Without changes to the UI behavior 2 may be risky and we can only go as far as behavior 3 or 4.

Thanks to @legobeat for convincing me to do a deep dive into ERC,RFC,etc.

cc @digiwand and @holantonela for thoughts on UI more competent than mine.

holantonela commented 1 year ago

Thanks for this analysis @naugtur! I think it makes sense to use the patters that we already have in place but reflecting the heuristics presented in your table:

❌ - mismatch error, prevent signing ---> API returns an error, the user never get exposed to that UI.
⚠️ - show a prominent scary warning before proceeding to signature ---> We use the warning message + the second step to confirm. 
👍 - match, ask user for signature --> regular SIWE screen.

legobeat commented 1 year ago

Related issues are gathered in #18471. The key insight in @naugtur's facts-summary which was missing from my preceding comment is the part where default ports are implied depending on scheme in RFC-3986.

Looking at opened issues, a popular interpretation seems to be that no explicit port in (as per ABNF scheme-less) domain part of the message should be interpreted as "wildcard" and be matchable for origins with any port as long as the hostname is matching. But there is a case to be made for a more conservative reading where the missing scheme is assumed from the actual origin and taking the stance that "no port = enforce default for whatever transport is actually used on the origin".

As noted there are some UX subtleties arising around how this is communicated regardless of which choice is made here.

https://github.com/MetaMask/core/pull/1163 has been updated according to the above.

naugtur commented 1 year ago

After sleeping on it I now think the most secure option for the end user would be to:

Enforce strict port matching by default
Require the request origin to be https, therefore the only option for no port is default to 443
Introduce a setting in the wallet to enable developer mode of sorts that disables those strict protections and replaces them with a warning (behavior 4)
- Why have a warning if it's a dev mode? Because it's not that hard to convince people to flip a switch in settings before proceeding with the phish.

One argument against enforcing https by default is that https in the browser introduces reliance on centralized authority and not all SIWE use cases might want that.

yonigoldberg commented 1 year ago

@legobeat and @naugtur - great analysis of the specs. Thank you 🙏

I am also convinced that the optional port should only be allowed when it is not provided. Otherwise, it should be enforced. With that, I am going to close the PR. We have updated Dynamic.xyz to always provide and validate the port in our SIWE.

@naugtur - one exception about enforcing https is localhost. From a developer standpoint, it should be allowed to use http on localhost.

digiwand commented 1 year ago

many thanks @legobeat and @naugtur!

These examples are useful and I'd like to propose a couple more examples. I agree with most of them except I have a concern with 1A. (sorry, I know you said "should not need further discussion." but I'm going for it :D) @naugtur

1A

I see ❌ now under the "warn only" session. Was this meant to be ⚠️😱? While we do not support a developer-mode setting, I don't think we should block this case entirely.

One scenario for the mismatched domain was explained in the related ticket #17707.

from @HeyFloM 's usecase https://github.com/MetaMask/metamask-extension/issues/17707#issuecomment-1470877080:

On our side, this is problematic for all integrations of Metamask with Unity WebGL games/metaverse. Why? Because all WebGL builds needs to be hosted on specific hosting services like itch.io so we can easily test them without having to upload each build on our Amazon S3 Buckets. So each development build have this error right now image

they did figure out a solution for production, but it continues to cause friction for development:

After identifying the issue, we aligned the URL of the domain hosting the production Unity build with the server URL that manages the Unity SDK and Web3 authentication (Moralis Unity SDK self-hosted server). As a result, there are no longer any problems with the production part.

However, the impact is still present on the development side because we use different hosting services for speed and agility to export and test quickly. These services make it very easy to see what are the results of your latest dev changes in the build, but they are on another domain. We could create a test environment on our servers, but this would require our developers to:

export the WebGl build from Unity

Manually import it on our Amazon S3 buckets

Convert it to react

Align the different domain URLs manually

And this at least 4 times per day, which would be a significant timeloss

Additional Examples

[edited] Situation 1D: A page hosted at http://evil.com requests SIWE signature with domain set to evil.com. Proposal: 1D | ❌ | ❌ | ❌ | 👍 Purpose: Evaluate non-TLS-protected website situation. Edited based on the comments below.

[maybe] Situation 1E: A page hosted at https://www.example.com requests SIWE signature with domain set to https://example.com. Proposal: 1E | ❌ | ❌ | ❌ | ⚠️😱 Purpose: This would be to track that we have considered this case of the www subdomain though I believe we should apply the same behavior as we do with other subdomains (Situation 1B); as www is another subdomain itself. Like other subdomains, it's possible for the "www" subdomain to host different resources than the root domain. (Relates to https://github.com/MetaMask/metamask-extension/issues/18332)

Implementing strict mode alongside a developer mode setting sounds good to me.

If we enforce strict mode for any of these situations, we'll need to inform developers of this change and give them time to make updates if needed.

legobeat commented 1 year ago

@digiwand My answer to the 1A scenario where the developer wants an application served from an origin of cdn.example.com to be able to claim a domain of dapp.example.com is to either distribute appropriate TLS certificates for dapp.example.com to the utilized CDN or introduce terminating TLS proxies, in either case to the effect of aligning domains. This not being circumventable is intentional. Same applies for 1E.

On the other hand, I'd say for your 1D: :+1: - Unless and until a new version EIP-4361 adds new semantics around scheme and transport - currently no TLS restrictions. This is with the differing take from here that "no scheme or port in domain" -> "implied any scheme; no default port assumed; no restrictions on port" . IMO default behavior of restricting to TLS origins only could have been good but is not an option for the immediate-term since it constitutes a breaking change (with existing integrations made) and EIP violation.

I think there's a way forward which maintains safe and compliant defaults without causing hindrance for local dev or requiring an "unsafe dev mode" (completely disable domain-binding validation? maybe a feature that could be enabled only in flask and beta), it can still be considered as a plan b.

digiwand commented 1 year ago

I think that makes sense. Distributing TLS certificates to the CDN will allow the appropriate domains to host those servers. Thanks for the advice @legobeat!

okay, I agree 1A no longer needs a change. I do still believe blocking the request entirely should be communicated before implementing this situation.

naugtur commented 1 year ago

For 1A anything other than rejection would be against the spec. It's the only thing ERC is clearly requiting us to do. Note that it's not a subdomain issue but a totally different domain.

I'm ok with allowing http for localhost regardless of how strict we decide to be.

digiwand commented 1 year ago

okay, that makes sense. maybe my communication was confusing. I think it's a good idea to block 1A once we have a developer setting. Even more so with the setup @legobeat mentioned above.

I'm ok with allowing http for localhost regardless of how strict we decide to be.

@naugtur when you mentioned allowing these cases, were you thinking with or without the warning modal?

Just read the updated comment now @legobeat and responding to 1D comment

re: 1D. okay, I see I missed this earlier. I agree and updated the comment with 👍. Snippet from EIP-4361 spec:

The following concerns are out of scope for this version of the specification to define: ... Protocols for use without TLS connections.

Good idea to include a specific setting to disable domain-binding in Flash/beta as a plan B. I've linked this comment in the PR to implement the developer mode setting https://github.com/MetaMask/metamask-extension/issues/18191#issuecomment-1503418941

p.s

Created https://github.com/MetaMask/metamask-extension/issues/18529
@legobeat - reviewing the related PRs today

legobeat commented 1 year ago

@bschorchit I think this issue should be fixed in #18518 , which is merged to develop ~(but did not make the cut for release for some reason so not sure when it's schedule do go out)~

There are some hypothetical further improvements/changes mentioned in the thread - anything we want to distill to new issue(s) or we close this off once #18529 makes it to release?

bschorchit commented 1 year ago

I've just checked and https://github.com/MetaMask/metamask-extension/pull/18518 is part of 10.31 release - which should be rolling out soon (next week I think) to users.

I don't think we need to wait to ship https://github.com/MetaMask/metamask-extension/issues/18529 before closing this off. We can track and discuss anything related to planning/communicating and blocking mismatched domains directly on that issue (and other related issues) and come back to this one for reference even if closed.

legobeat commented 1 year ago

Oh, I somehow missed that it was indeed part of the release - that's great and sounds good to me - thanks @bschorchit.

MetaMask / metamask-extension