decentralized-identity / didcomm-messaging

https://identity.foundation/didcomm-messaging/spec/
Apache License 2.0
164 stars 57 forks source link

Proof of work scheme to protect messaging endpoints in DIDComm #66

Open tplooker opened 4 years ago

tplooker commented 4 years ago

Consider the simple example of Alice and Bob who are both enabled with DIDComm messaging infrastructure and Alice wanting to send a message to Bobs DID which she has discovered by some means. Resolving Bob's DID document gives Alice all of the information required for Alice to prepare an encrypted message along with where to send the message (e.g a service endpoint in Bobs DID document). However Alice and Bob have never communicated before and Bob is offline at the time the message is being sent, so how can Bob protect the service endpoint he has listed in his DID document from malicious actors who do not want to engage in meaningful communication with him and instead want to spam him (e.g a DDOS attack or filling bob's inbox with spam messages)?

An important distinction to point out in this situation is that the request Alice will send to Bob's endpoint will result in a storage cost to Bobs service provider (e.g storing the message in the request), all be it most probably temporary. At an HTTP level this is like observing the difference in nature between an HTTP GET (usually about retrieving an existing resource) vs an HTTP POST (usually about creating a resource), the immediate cost for a server to carry out a request and long term implication of an attack at an HTTP GET based endpoint vs an HTTP POST is quite different.

There are already a variety of techniques used by HTTP based server deployments to protect public endpoints, however many require pre-authorization (e.g having some form of capability like an access_token or APIKey) or mechanisms at different levels to appropriately protect endpoints like IP Address filtering.

A potentially simple solution to this problem is to define a proof of work mechanism that could be attached to an HTTP request such that the server can quickly validate and use to decide whether to proceed with the request.

An example proof of work scheme could operate as the following using Hashcash like bitcoin does, say bob's endpoint advertises the following in his messaging endpoint

{
    "type": "DidCommMessagingService",
    "id": "did:example:123456789abcdefghi#didcomm",
    "serviceEndpoint": "https://messaging.example.com/83hfh37dj",
    "proof": {
      "scheme": "Hashcash",
      "difficulty": "1" <- number of preceeding 0's that must be present in the resulting hash
    }
}

This endpoint communicates to Alice's client that in order for her to send bob a message she must include a proof of work (e.g a SHA-256 hash of here encrypted message that has one preceding 0).

This simple anti-spam mechanism works provided the difficulty of the proof is set so that the marginal cost of a spammer creating an attack is outweighed by the value of the attack.

Unresolved questions

  1. The proof could be a wrapper in the Body of the HTTP request sent to the server
  2. The HTTP request could be canonicalized like with HTTP signatures protecting the entire request

It also important to note that this mechanism need only apply to the first message between Alice and Bob, Bob could then respond with a capability (e.g an access_token) that Alice can then include in her next message to Bob, in order to prevent Alice from doing needless work in sending future messages.

OR13 commented 4 years ago

I feel like the the term "proof" is not correct here... its really proofOfWork, I wonder if its better to define a Linked Data Proof for Work, like Hashcash2020 and then define how it is applied to either messages or http requests, and then define how it is registered as a protection mechanism for service endpoints.

We need something similar to this for Secure Data Stores.

wyc commented 4 years ago

Here are two ideas that could help with the same problem. They can coexist with the PoW approach described here to form an arsenal of spam-prevention mechanisms. I'm mentioning them because they should get funneled into the same part of the schema.

  1. Alice includes a small amount of cryptocurrency in her request by including a private key to an account holding that exact amount on a mutually agreed upon network. She sends 100 Dogecoin or whatever to make it worth Bob's while to keep the message in resources costs, and Bob's service endpoint will accept messages from Alice for up to 100 Dogecoins (or whatever) worth of expenses. Alice can top up the endpoint at anytime. Perhaps after some time, Bob trusts Alice and adds her to his list of authorized resource freeloaders.

  2. In his DID Doc, Bob lists a service proxy of his choice, ProxyCo, that is interested in getting Alice's message to him and filtering spam on Bob's behalf. Bob is okay with adding an intermediary to the system in return for a free DID cubby. He trusts ProxyCo's filtering capabilities due to their scale, and based on ProxyCo's reputation he further trusts it not to behave maliciously or to delay messages unnecessarily, at least this service. The messages are sent to and held at ProxyCo as an encrypted blob with metadata but forwarded to Bob at his discretion in part or in whole.

swcurran commented 4 years ago

I think this is an excellent idea. I'm guessing everyone here knows this already, but this use case is the reason that Proof of Work was first invented - to prevent spam email. To apply this to DIDComm is entirely appropriate.

OR13 commented 4 years ago

Learnings from TOR / I2P-Monero, etc...

The worst case scenario is to assume a network of nodes which are untrusted / malicious.

This means that every message received, must be paid for by the sender in some provable way, and discarded otherwise... no sender can be trusted not to be a spammer, and no receiver can be trusted with authorization capabilities.

See also:

Essentially... this is a really hard problem...

I'd suggest a tiered approach where we solve for the following goals in order:

replace proof of work with proof of payment, etc...

wyc commented 4 years ago

@OR13 I'm assuming "hop" means the message changing hands before getting to its destination like going through another router in TCP/IP land. Can you please elaborate on what you mean by "transports"?

tplooker commented 4 years ago

Some thoughts from the call, to be complete as a spam protection mechanism a server must effectively be able to de-duplicate identical requests (e.g have some form of replay attack protection), some of the options to consider are

  1. Assuming that each message presents uniquely (e.g the cipher text of a message is reliable unique), with this uniqueness a server can use this to track previous identical messages and discard duplicates
  2. Require that the current epoc timestamp be included in the POW, a server can then set a tolerance/window for which it will accept a message from.

Option 1 asssumes the sever has to track some form of state which can be a burden, Option 2 does not but requires a finely tuned tolerance window that allows for clock skew but is narrow enough to prevent spam. Perhaps the solution is to use the solutions in conjunction with each other, where a server only needs to track the messages sent within the window?

OR13 commented 4 years ago

@OR13 I'm assuming "hop" means the message changing hands before getting to its destination like going through another router in TCP/IP land. Can you please elaborate on what you mean by "transports"?

A transport is a channel between 2 parties. A hop is when a single message traverses a channel.

@tplooker One thought that occurred was if didcomm messages were always meant to be short lived / self destructing / etc... that protects against a number of scenarios.

kdenhartog commented 4 years ago

In some cases, (such communications where both parties use did:peer) limited discovery of a service endpoint may be enough to limit spam and solve this problem in a different way.

Put another way, because the endpoint is not easily found it's not easily spammed. However, in the case of something like public DID this will be a useful mechanism. In any case, I do believe we should make some mention about spam prevention and the various techniques that can be used to filter the legitimate messages from the spam.

swcurran commented 4 years ago

I have question about this. If a spammer has a sufficiently lucrative business, could they use a hardware HashCash processor to reduce the cost of the PoW for massing "mailings"? To allow legitimate users, we have to assume a non-hardware challenge. The process works for Bitcoin because the challenge difficulty is raised as needed, but that wouldn't be an option here. Or is a hardware accelerator not an option?

tplooker commented 4 years ago

I'm not sure I follow your question? Are you asking how to set the difficulty high enough to make it cost prohibitive to bad actors but low enough to not punish good actors?

swcurran commented 4 years ago

Yes, but in the scenario where bad actors decide it's worth the expense of getting specialized hardware to shorten the PoW time to deliver lots of spam. We can't tell a bad actor from a good actor, so if a bad actor can do that, this deterrent has no value.

That's essentially what happened with Bitcoin (CPU -> GPU -> ASIC), but it was an arms race by all parties. We can't have an arms race because the good actors here won't play.

Or am I not understanding the hashing options for executing PoW?

wyc commented 4 years ago

@swcurran if we're looking at security engineering as making attacks more expensive and difficult, then this would still help in that regard right? To the point about difficulty matching to an aggregate hash rate, I think we could have a flag for difficulty set by the recipient as seen in the initial example. Would this be reasonable? Including an actual payment as I described above could be based on a non-PoW blockchain too.

Also, as soon as we also want to consider highly resourced attackers then there are other aspects of the system which could be the weakest link, such as the actual servers handling the response getting DoS'd for a comparable or less sum of money.

mwilcoxnz commented 4 years ago

21e8

OR13 commented 4 years ago

There are algorithm design considerations for being ASIC resistant, and then there are protocol options (changing the algorithm rapidly between different ASIC hardened algorithms)...

Like most things, the attacker will just migrate their attack to the weak point... its the defenders job to make the weakest point somewhere common in the stack, like Cloudflare or whatever... where the cost for the defender is low and the cost for the attacker is high....

kdenhartog commented 3 years ago

Marking this as defer for now because I don't think this is a requirement to V2 work. It appears to be an extension that's a good idea left as an implementation detail