Unable to view https://wooorm.com/server-components-mdx-demo/ due to fingerprinting

ryanbr commented 3 years ago

Description

Loading page https://wooorm.com/server-components-mdx-demo/ is scrambled

Steps to Reproduce

With default shields, open https://wooorm.com/server-components-mdx-demo/
Page is scrambled due to fingerprinting

Actual result:

woorm

Expected result:

woorm2

Reproduces how often:

Easily.

Brave version (brave://version info)

Version 1.20.110 Chromium: 88.0.4324.192 (Official Build) (64-bit)

Version/Channel Information:

Can you reproduce this issue with the current release? Yes
Can you reproduce this issue with the beta channel? Yes
Can you reproduce this issue with the nightly channel? Yes

Other Additional Information:

Does the issue resolve itself when disabling Brave Shields? Yes
Does the issue resolve itself when disabling Brave Rewards?
Is the issue reproducible on the latest version of Chrome?

Miscellaneous Information:

Was reported via twitter; https://twitter.com/Fdecampredon/status/1366438532173225992

Due to fingerprinting, the following error in the console cc: @pes10k

error-woo

bridiver commented 3 years ago

The content of the page is encoded in the image so even a tiny change made by image farbling will mess up the decoding

wooorm commented 3 years ago

Is there a reason data uri’s are changed?

wooorm commented 3 years ago

Also see: https://github.com/wooorm/server-components-mdx-demo/issues/2. I’d love it if someone has a better idea 😅

pes10k commented 3 years ago

hi hi! So, was wondering, could you do the following:

encode your JS (or arbitrary data) as a base64 data url
include that in a script tag
mark the tag async

Would that cover the use case here?

(and as background for folks who find this later, Brave doesn't modify data uri's, we modify the results of canvas readback, which in this case is being encoded as a data uri :) )

wooorm commented 3 years ago

cc @lovasoa too, that might work 🤔

And also good to know that the canvas is the problem, not the url!

pes10k commented 3 years ago

(Im closing the issue now, but lets keep discussing here if it'd be helpful)

lovasoa commented 3 years ago

Hey! I don't think this should be closed. This is definitely a bug in the browser. There is no privacy issue with reading a static png from a canvas.

pes10k commented 3 years ago

Hello @lovasoa. You're right that there is no privacy issue from using canvas readback to read a static PNG pixels.

The issue is that there isn't a way to know whether the canvas's contents came from a static source or a dynamic source. I saw your suggestion for a taint tracking approach, but in practice taint-tracking in a dynamic system gets extremely expensive extremely quickly (especially if we need to account for persisting taint through DOM syncs, network, etc), and even then there are coverage issues in taint systems (implicit taint / flow control, etc), so its not something we could deploy and maintain against upstream blink (plus there are perf concerns, etc).

Can I ask, why not use the approach in https://github.com/brave/brave-browser/issues/14421#issuecomment-789081440? It seems like it would work, possibly even be faster, and be more "working with the grain" in the browser.

Also, as a bonus, it'd also work in Tor, Brave and with privacy-tools like Canvas Defender.

WDYT?

bridiver commented 3 years ago

@wooorm @BrendanEich suggested that you were not correctly escaping when you tried the script tag method <\/script>

wooorm commented 3 years ago

@bridiver because it's arbitrary data (not really json; I don't have control over it but I trust it), and scripts are complex: https://github.com/wooorm/server-components-mdx-demo/issues/2#issuecomment-788765692

lovasoa commented 3 years ago

The issue is that there isn't a way to know whether the canvas's contents came from a static source or a dynamic source.

This is not true. The browser knows exactly what was drawn on the canvas.

taint tracking approach, but in practice taint-tracking in a dynamic system gets extremely expensive extremely quickly

There is ALREADY taint tracking for canvases. This is a single bit that has to be set when the user either writes text or gets a 3D context on the canvas. I don't think there would be any performance impact.

lovasoa commented 3 years ago

encode your JS (or arbitrary data) as a base64 data url

include that in a script tag

mark the tag async

The PNG is not only decoded asynchronously, it is also a compressed data representation that is decompressed asynchronously.

lovasoa commented 3 years ago

its not something we could deploy and maintain against upstream blink

This may be true, but this means that this is a bug you don't have the resources to fix, not that this is not a bug 😉

pes10k commented 3 years ago

This is not true. The browser knows exactly what was drawn on the canvas.

The browser knows whats on the canvas of course, but the tricky part is where did the input come from. The existing "taint tracking" in the canvas is much simpler bc the input taints only need to be the immediate previous steep. The existing system doesn't attempt to comprehensively persist taint labels through DOM sinks and intermediate JS values.

This prevents the taint / label explosion, but means its not useful or comparable to what would be needed here.

Put differently, the difficult part is that fingerprinters are antagonistic, so labels would need to be comprehensive across JS and the DOM, and that is an enormously difficult problem. It would also for sure not be on the right side of the cost / benefit curve for a way of async load javascript ;)

The PNG is not only decoded asynchronously, it is also a compressed data representation that is decompressed asynchronously.

I'm very surprised by this. If you have numbers that show PNG is an efficient compression algorithm for arbitrary text, please share them (sincerely). I'm not saying its not the case, but i would be extremely surprised if it were. Almost certainly you'll be better off just using a text compression algorithm and base64'ing the results.

Either way, happy to agree to disagree on any of the above. I think it would be nice if you mentioned in your docs that your library will break sites for visitors using Tor Browser Bundle, or Bromite, or canvas fingerprinting-protecting extensions, or similar tools (in addition to Brave), especially since consumers of your lib likely wouldn't expect to be incompatible w/ such privacy tools

Either way though, thank you for taking the time to discuss. I'm going to keep the issue closed and consider the issue resolved

lovasoa commented 3 years ago

I think I am missing something. What does "persist taint labels through DOM sinks" mean ? Why wouldn't just tainting the canvas whenever a call to either CanvasRenderingContext2D::fillText or HTMLCanvasElement.getContext("3d") work ?

I'm very surprised by this. If you have numbers that show PNG is an efficient compression algorithm for arbitrary text, please share them (sincerely).

You can easily do it yourself : using gimp, open a file that compresses well as raw image data. Then save it as PNG, and compare the file size. This works because PNG uses a simple compression algorithm (deflate) directly on the pixels (after a pre-compression stage).

lovasoa commented 3 years ago

To show an example with the famous enwik8

$ npx bin2png /tmp/enwik8 /tmp/enwik8.png 
npx: installed 2 in 2.379s
Converting /tmp/enwik8 to /tmp/enwik8.png
Success. File size difference: -11%

And comparing the base64 png to the base64 raw text, the ratio is even slightly better. Of course the ratio highly depends on how compressible the source data is.

bridiver commented 3 years ago

@lovasoa I think what @pes10k is really getting at here is whether doing all of this actually results in a non-trivial difference in perceived load time. Other types of content can be compressed with Content-Encoding using a variety of algorithms so wire size isn't going to be significantly different and nothing extra is required to decompress them. At least for the original example blocking doesn't seem to be relevant because there is no content at all until the image is decompressed, but with a script tag you could use async/defer to avoid blocking. If this becomes a common method for loading content then we will probably have to rethink our stance on it, but we can't really justify the kind of work it would take to fix this right now. PRs are always welcome, but if it requires patching I think it would be unlikely to be approved at the current time. In the meantime disabling fingerprinting protection allows the page to load normally.

brave / brave-browser