Solana Pay message signing

anza-xyz / solana-pay

A new standard for decentralized payments.

https://solanapay.com

Apache License 2.0

1.29k stars 450 forks source link

Solana Pay message signing #152

Open samheutmaker opened 2 years ago

samheutmaker commented 2 years ago

Edit: The information below is no longer up-to-date. A current version of the proposed changes can be viewed at #169.

All feedback is appreciated. I can be reached on telegram @samhogan.

Original Issue

Solana Pay should support message signing via HTTP request. The following is an attempt to summarize how a dapp can request that a wallet sign a message via Solana Pay.

The flow is as follows:

The user scans a QR code or taps NFC.
The wallet parses the link from the QR code and makes a GET and POST request in accordance with the Solana Pay transaction request specification.
The server responds with the following fields in the request body: The <data> field is the data that will be signed. It must be a base64-encoded value adhering to the Proposal for off-chain message signing solana#26915 specification once it is finalized. The <state> field is a MAC that the wallet will pass back to the server in order to verify that the contents of the <data> field were not modified before signing. The <message> field is an optional UTF-8 string value to display to the user.
```
{
"data": "aGVyZS1pcy1hLW1lc3NhZ2UtdG8tc2lnbg==",
"state": "eyJhbGciOiJIUzI1NiIsInR5cCI6IkpXVCJ9.eyJzaWduYXjF9.43WZK82a_sGU-ImVvCUnMJmNprs6Fe30pm0",
"message":  "Login with Solana Pay"
}
```
The wallet displays the message and asks the user if they would like to sign the data.
If the user signs the data, the wallet makes a second POST request to the same url with the following fields in the request body: The <account> field is the base-58 encoded value of the user's public key The <state> field is the same as the <state> field from the first POST request response body. It should be returned to the server unmodified. The <signature> field is the base-58 encoded signature from signing the <data> field with the users private key
```
{
"account": "C9uYZinjZmmqxaF7FENdmzMVMuNVqRkKHXpbmd933W98",
"state": "eyJhbGciOiJIUzI1NiIsInR5cCI6IkpXVCJ9.eyJzaWduYXjF9.43WZK82a_sGU-ImVvCUnMJmNprs6Fe30pm0",
"signature": "CUghXMN18pTQPoxm9zmrQY1PQXctp3xrGHpXxSJd1UYkRcBQBaUetZwKuc57VgjwjH7cC1Zbm6t1Zz1WJkVBMnW" 
}
```
The dapp server will decode the <state> field to ensure that it has not changed and then attempt to verify the <signature> against the content from the <state> field and <account> field. If the signature is verified, the server knows that the wallet controls the private key for the <account>.

jordaaash commented 2 years ago

Some thoughts --

adhering to this specification.

Bad link, I assume referring to https://github.com/solana-labs/solana/pull/26915?
Since message is used in transaction responses too, data makes sense to differentiate. But signedMessage when the data is what gets signed is a bit confusing. signedData could work though. I'll use it to refer to this concept.
On that subject, however, what are the use cases envisioned for message to be provided alongside data? The argument for including it with transaction is more obvious.

The field is the same as the field from the first POST request response body

signedData may not be the same if it was prefixed (which is why we include it here) but arguably we could do away with this entirely if we specify in the spec that the data must be prefixed according to https://github.com/solana-labs/solana/pull/26915. This requires that spec to become finalized, though this is a major blocker for Mobile Wallet Adapter 1.0, so I would expect this to happen soon.
signature should be specified as the base64 encoded signature bytes (many wallets use base58 for message signatures as well as tx signatures).
We probably need to think about how practical a stateless design for this is. If a malicious client modifies the data, then posts the modified signedData along with its signature, the server has no way of knowing without persisting something. It could persist it statefully by the account, or persist it statelessly by adding some message authentication or signature of the data to the data itself, or perhaps in another field that must be returned.

/* req 1 */ { "account": "base58" }
/* res 1 */ { "account": "base58", "data": "base64", "mac": "base64" }

/* req 2 */ { "account": "base58", "data": "base64", "mac": "base64", "signature": "base64" }
/* res 2 */ 200 OK

An overall downside of all this is increasing complexity of the protocol, mostly from the API server's end. Servers need to be pretty smart about what they put in the message (because it can't be modified/corrected), how they authenticate it (because they need to generate MACs and verify MACs as well as the signatures), how they store private keys (to generate MACs), and how to prevent replay attacks.
This might be okay for the protocol to give it more flexibility, but it makes me wonder if we will want to just implement a full authentication solution OOB, because it's the most obvious use case, every server is going to have to create their own, and many will end up half-baked and unsecure.
Another possibility is to take a very different direction, something like Glow's signIn() functionality. Instead of having the server generate a message to be signed, we have the wallet generate a stock message with a URL, account, timestamp, and nonce, sign it, and provide that to the server. This has numerous pros and cons. One pro is that it can be done in one hop, because the server doesn't need to know the account first. One con is that this likely requires a new URL spec (can't just blindly sign in to tx request URLs).

jordaaash commented 2 years ago

Tagging @jnwng for feedback, he's been interested in this as well.

samheutmaker commented 2 years ago

I changed <signedMessage> to <signedData>. This makes much more sense.

On that subject, however, what are the use cases envisioned for message to be provided alongside data? The argument for including it with transaction is more obvious.
Including <message> allows dapps to add context to the message signing request. For example, depending on the action a user is taking, a dapp may send {"message": "Login in with Solana Pay"} or {"message": "Confirm address ownership to continue"}, or whatever makes sense in that context. It also keeps the spec more inline with transaction requests, which Solana Pays devs are already familiar with.

signedData may not be the same if it was prefixed (which is why we include it here) but arguably we could do away with this entirely if we specify in the spec that the data must be prefixed according to https://github.com/solana-labs/solana/pull/26915. This requires that spec to become finalized, though this is a major blocker for Mobile Wallet Adapter 1.0, so I would expect this to happen soon.
I like the idea of adhering to this spec as it offers consistency within the ecosystem. This means that <signedData> would contain only the <message> (bytes 0x14..n) from the <data> payload, I think.

signature should be specified as the base64 encoded signature bytes (many wallets use base58 for message signatures as well as tx signatures).
Updated the initial issue to reflect this.

An overall downside of all this is increasing complexity of the protocol, mostly from the API server's end. Servers need to be pretty smart about what they put in the message (because it can't be modified/corrected), how they authenticate it (because they need to generate MACs and verify MACs as well as the signatures), how they store private keys (to generate MACs), and how to prevent replay attacks.

This might be okay for the protocol to give it more flexibility, but it makes me wonder if we will want to just implement a full authentication solution OOB, because it's the most obvious use case, every server is going to have to create their own, and many will end up half-baked and unsecure.

It may make sense to keep the Solana Pay spec relatively simple and create a separate package, or another package within Solana Pay, to encapsulate the server-side complexity involved in creating data values that properly hold a signature of themselves, a timestamp to prevent replay attacks, etc. This package could also contain an OOB authentication solution. While most dapps would likely opt to use this package, it would be possible for a developer to write their own message signing solution that tracks data values in another manner.

Another possibility is to take a very different direction, something like Glow's signIn() functionality. Instead of having the server generate a message to be signed, we have the wallet generate a stock message with a URL, account, timestamp, and nonce, sign it, and provide that to the server. This has numerous pros and cons. One pro is that it can be done in one hop, because the server doesn't need to know the account first. One con is that this likely requires a new URL spec (can't just blindly sign in to tx request URLs).

This is an interesting idea that deserves more exploration. My initial thought is that it's a fairly significant departure from how transaction requests work. I'm not sure saving a single hop is worth changing the URL spec. I'll noodle on this item a bit more over the next couple days. A few questions:
- Is there any downside to disallowing servers from creating their own data values?
- Assuming we want to include a message field for context, how would that get encoded in the URL?

jordaaash commented 2 years ago

3. This means that <signedData> would contain only the <message> (bytes 0x14..n) from the <data> payload, I think.

Ah, what I meant is that signedData will always be equal to data if we do this, so we would just have (ignoring MAC for this example):

/* req 1 */ { "account": "base58" }
/* res 1 */ { "data": "base64" }

/* req 2 */ { "account": "base58", "data": "base64", "signature": "base64" }
/* res 2 */ 200 OK

It may make sense to keep the Solana Pay spec relatively simple and create a separate package, or another package within Solana Pay, to encapsulate the server-side complexity involved in creating data values that properly hold a signature of themselves, a timestamp to prevent replay attacks, etc.

The server needs to be able to know that data hasn't changed between res 1 and req 2. data may also need to be human-readable, and length is an important constraint. We don't need the user to sign the MAC, so we shouldn't put it in the data, but we do need the MAC, because we can't trust the data without it.

This suggests to me another field that's just passed back and forth as-is. It's intentionally opaque to the user and wallet. They don't need to see this value because they aren't signing it, and we don't care how long it is (within reason).

For example, mac = encrypt({ account, data, timestamp, nonce }, secret), which is passed to the wallet, then returned as-is and decrypted. We shouldn't specify what goes in the value at the level of the spec, but we should provide a field in the spec (maybe not called mac) that lets you pass arbitrary state back and forth.

My initial thought is that it's a fairly significant departure from how transaction requests work. I'm not sure saving a single hop is worth changing the URL spec.

I'm not sure neither, but it's an interesting constraint. We should try to understand what use cases there are for message signing besides "sign in".

jnwng commented 2 years ago

before i saw this issue, i was working on a proof-of-concept of a server-side verification flow that accommodates various efforts to standardize the semantics of "Sign-in with Solana". i was primarily focused on creating a reference example of a verifying the encoding of a signed transaction without submitting it to the network (workaround for Ledger), but wanted to bake in some of the in-flight thoughts here just to try them on for size.

The server needs to be able to know that data hasn't changed between res 1 and req 2.

definitely agree. the strategy outlined to handle the machine encryption seems suitable, although in my reference i skip it. i can give it a try and see how it feels!

one pattern that i thought felt rather clean... instead of having a third hop to the server-side API, i baked in the message to sign in the GET request as a third parameter. since my reference client-side isn't using Solana Pay, i'm not sure how this affects the user workflow (haven't thought about it yet), but subsequently the POST to the URL could have the signature and go through the verification flow.

this means that:

the server generates the message to verify too (which follows the CAIP-122 standard)
the server response doesn't need to hold a transaction; we're already done at this point (and some server-side behavior like linking to a user account has completed), allowing us to omit the transaction return field define some other behavior

open to critique here! again, haven't really thought through the GET flow, although it made this client-side implementation fairly straightforward

jnwng commented 2 years ago

OTOH, this is also a lot of work to subvert the gas fee. in the short-term submitting a transaction with Memo using a nonce and validating with Solana Pay is a reasonably inexpensive process to execute, provided it is backed up on the server with a cookie / JWT that the client can use afterwards

samheutmaker commented 2 years ago

The server needs to be able to know that data hasn't changed between res 1 and req 2. data may also need to be human-readable, and length is an important constraint. We don't need the user to sign the MAC, so we shouldn't put it in the data, but we do need the MAC, because we can't trust the data without it.

Got it. OAuth supports passing a plaintext state parameter that allows you to validate responses. Instead of mac, we may want to call this field state or something similar. Devs can technically put whatever they want in this value, though JWTs seem like a good fit here.

I'm not sure neither, but it's an interesting constraint. We should try to understand what use cases there are for message signing besides "sign in".

I did a 10 minutes of digging and authentication seems like the only active use-case so far. I'll put out some feelers and see if anyone is doing something more creative.

From @jnwng's proof-of-concept:

Primarily, this strategy can be used along with OAuth-related account management to "associate" multiple Solana wallets to an OAuth account.

This seems like a really good end goal for an OOB Solana auth solution, whether a user is using Solana Pay or a traditional wallet connection.

OTOH, this is also a lot of work to subvert the gas fee. in the short-term submitting a transaction with Memo using a nonce and validating with Solana Pay is a reasonably inexpensive process to execute, provided it is backed up on the server with a cookie / JWT that the client can use afterwards.

The Solana Pay dapps I've seen will usually pay these gas fees as requiring your users to pay to sign-in is not great UX. This opens the dapp up to attacks where a malicious actor repeatedly requests memos and drains the dapps fee paying account. Maybe not a huge concern in the short term. Offering real message signing based on this proposal also provides a more consistent dev experience.

Here's a reference implementation of how we do Solana Pay authentication in Bedrock using memo transactions. This has only been used for toy apps so far but currently we don't pay fees for sign-in to avoid this exact issue. Live version here.

jordaaash commented 2 years ago

OTOH, this is also a lot of work to subvert the gas fee.

While I agree, I view it as less about saving on fees and more about the UX benefits of enabling a lower trust threshold. Users must be more careful of signing a transaction than signing a message, so wallets have to handle the presentation differently.

It's also private, being off-chain. You may not want to put something on chain to symbolize that your users were physically somewhere at some time doing something.

OAuth supports passing a plaintext state parameter that allows you to validate to responses.

Yeah, this sounds about right. We should definitely learn as much from OAuth as we can here.

authentication seems like the only active use-case so far

It may be worth getting feedback from some POAP and event/ticketing protocols (Cupcake, SOAP, and Cardinal come to mind).

samheutmaker commented 2 years ago

It's also private, being off-chain. You may not want to put something on chain to symbolize that your users were physically somewhere at some time doing something.

This is huge for live events. High profile consumers don't want to broadcast their location when scanning into an event.

It may be worth getting feedback from some POAP and event/ticketing protocols (Cupcake, SOAP, and Cardinal come to mind).

I reached out to Cardinal, SOAP, Disco, Decaf, and Fantastix. Feedback incoming.

what-name commented 2 years ago

Glad to see this being solved! There are a few ideas that this would make easier.

Event ticketing can be extended to "hold any of these nfts for free entry", where the user can enter the event/building by having a certain pre-existing NFT (not explicitly ticket). With signing, the host can authenticate if the person holds a certain SOAP or membership NFT. I could imagine an iPad with a dynamic QR code that the user scans (through regular camera or within wallet), and signs a request. The host then can check on chain if the person indeed owns a certain NFT or not.

Without signatures, the only possible way rn is memo tx (which is very unoptimal) or the host scanning a qr on the guest's phone (qr as image of the ticket NFT) which brings almost no added benefit to NFT ticketing.

IRL raffles can be made possible by users scanning a QR, authenticating and the host storing their entry. (This would need to have some form of "allowlist" to not let a single person enter with any number of wallets.)

How this gets implemented on a technical level is out of my scope of expertise, but this allows for a lot of yet undiscovered use cases.

cutemonstersnft commented 2 years ago

glad to see this being discussed! Im from Monstrè @longeye_monstre, we built Solana Pay powered NFC Gift Cards for the hackathon. This is definitely not my area of expertise but keen to share some thoughts.

Another use-case outside of events (but still falls under authentication) is wallet to wallet messaging. Dialect protocol would benefit greatly from this, imo. Will be dming you @samheutmaker on tg!

mcintyre94 commented 1 year ago

Really cool to see this being worked on! One concern I have is do we want to specify whether/how apps can 'reject' a sign request?

For example if I have an NFT-gated part of my app and somebody sends the POST request in this spec, I get their public key at that point. I could look up whether they have any of my NFTs, see they don't and send back a 403 instead of letting them authenticate with that key and then having my own in-app logic to check their access afterward.

Is that a valid way for an app to behave? Or should a compliant app accept all signature requests with this scheme and only check privileges later? I think it's more obvious with browser wallets because AFAIK an app can't get in the middle of the connect flow and throw in their own checks, but this specification explicitly gives them the public key and ability to fail at that first (or second) hop.

This probably applies to transaction requests too ATM, but there are probably more use cases for "you don't have access" than "you can't perform a transaction".

Potentially related to https://github.com/solana-labs/solana-pay/issues/150 for both existing transaction requests and this?

jordaaash commented 1 year ago

For example if I have an NFT-gated part of my app and somebody sends the POST request in this spec, I get their public key at that point. I could look up whether they have any of my NFTs, see they don't and send back a 403 instead of letting them authenticate with that key and then having my own in-app logic to check their access afterward.

Is that a valid way for an app to behave?

Yes, this is valid and expected. The spec doesn't specify the structure of error responses (yet) but it does specify that the wallet must handle 200 and 3xx-5xx responses: https://github.com/solana-labs/solana-pay/blob/master/SPEC.md#post-response

samheutmaker commented 1 year ago

For example if I have an NFT-gated part of my app and somebody sends the POST request in this spec, I get their public key at that point. I could look up whether they have any of my NFTs, see they don't and send back a 403 instead of letting them authenticate with that key and then having my own in-app logic to check their access afterward.

Keep in mind that a user is not authenticated until they sign the message regardless of whether or not the address you received passes a token gate you may have set up. The user must sign the message to prove they hold the private key to that address.

Given this, it might be helpful to think of authentication and authorization as two separate steps. It may be better to let them sign the message (authentication) and then block access if they fail to pass the token gate (authorization).

The actual specification will leave it up to you to decide what is right for your dapp.

jordaaash commented 1 year ago

This has been specified here: https://github.com/solana-labs/solana-pay/blob/master/message-signing-spec.md

This is still alpha and unimplemented. If you're following this issue or have a use case for this, please review this spec!