WICG / compression-dictionary-transport

Other
92 stars 8 forks source link

Consider Websocket use case #30

Closed pmeenan closed 11 months ago

pmeenan commented 1 year ago

Websockets themselves would fail a same-origin check for a dictionary delivered over HTTPS.

Would it be valuable (and safe) to allow for the path matching URL in the dictionary response to specify a wss:// scheme along with a match path (and explicitly restrict dictionaries to https, not just same-origin)? Then the dictionary-setting part of the spec could require that the match path be same-origin (and https) or the equivalent origin if wss was used as a scheme in the match path.

Something like:

AFAIK, the actual compression should work fine for data delivered over a websocket as long as the encoding supports streamed compression (which is usually a requirement before adopting a new compression algorithm anyway).

pmeenan commented 1 year ago

The match param is now a full URL which opens up the possibility for websocket support by allowing https dictionaries to apply to wss URLs if the origin is otherwise the same (host and port). Maybe allow it independent of port?

Would need a serious security review by people more familiar with attack vectors for websockets to know if there is a concern.

For security, presumably, the same hash-based safety applies where the worst-case scenario is that the websocket server doesn't have the dictionary advertised.

For privacy, the dictionary hash allows passing information from the https:// response into a wss:// request which may be a concern.

yoavweiss commented 1 year ago

The match param is now a full URL which opens up the possibility for websocket support by allowing https dictionaries to apply to wss URLs if the origin is otherwise the same (host and port)

Wouldn't that enable cross origin communication? (as the scheme is part of the origin definition)

pmeenan commented 1 year ago

Wouldn't that enable cross origin communication? (as the scheme is part of the origin definition)

Yeah, that was my main concern with:

For privacy, the dictionary hash allows passing information from the https:// response into a wss:// request which may be a concern.

I think websockets will be able to re-use the sec-available-dictionary: and accept-encoding:/content-encoding: part of the flow but setting a dictionary to use is probably going to require JS API changes. I'll add a note by the compression API section with some notes and putting the websocket JS API changes out-of-scope for this.

pmeenan commented 1 year ago

I put up a PR marking websockets as out-of-scope.

That said, if we allow the <link rel=dictionary> to provide a client-specified path (maybe limit it to wss) we can probably handle it without API-surface changes. e.g. <link rel=dictionary href="https://xxx.com/dictionary" match="wss://xxx.com">

The main risk with that is setting dictionaries for an origin that you don't control but it gets around the cross-origin data leak problem since the document can already pass arbitrary data through the websocket URL (or socket itself) and the dictionary already needs to be cors-readable.

It still doesn't feel like a "clean" solution though and can always be layered on top of what we have defined here.

yoavweiss commented 1 year ago

That said, if we allow the <link rel=dictionary> to provide a client-specified path (maybe limit it to wss) we can probably handle it without API-surface changes. e.g. <link rel=dictionary href="https://xxx.com/dictionary" match="wss://xxx.com">

This still feels like a cross-origin leak to me (dictionary in one origin would be applied to resources in another). Is there a strong benefit to applying dictionaries to web sockets that I'm missing?

pmeenan commented 1 year ago

This still feels like a cross-origin leak to me (dictionary in one origin would be applied to resources in another). Is there a strong benefit to applying dictionaries to web sockets that I'm missing?

Is it still considered a cross-origin "leak" if the application is the one explicitly plumbing the data through from a cors-readable response to a request on another origin? It doesn't seem any different than added query params on the URL.

We don't need to solve it here but I assume websockets will see benefits depending on the type of the content. If the websocket is carrying json-encoded messages then the savings could be significant like we see on some API fetch requests we have been looking at (50-60%). Creating an external dictionary with some of the standard template responses, the json structure, etc could provide binary-packed message sizes for json-style content.

I don't know that there's enough websocket usage for compressible content to make it meaningful at scale but I don't think there's anything in the current plan that makes it difficult to build websocket support on top of at some point in the future.

LPardue commented 1 year ago

I see you just closed this but.. I was surprised this ticket didn't mention Compression Extensions for WebSocket https://datatracker.ietf.org/doc/rfc7692/

The spec defines a compression framework and one concrete example using DEFLATE. It would seem that any work in this area would be well to do fitting in that framework.

pmeenan commented 1 year ago

@LPardue I closed it mostly because the text marking websockets as out-of-scope is in a PR. It can wait until the PR is merged to actually close though.

Thanks for the pointer to the compression extensions. I don't see anything there that wouldn't work with dictionary-based streams (though it will require a fair bit of spec work to define the extension and APIs). Either way, nothing in this work precludes that.

LPardue commented 1 year ago

Putting it out of scope seems fine to me. No need to keep issue open if we are putting the problem now.