Closed ryanbr closed 3 years ago
The content of the page is encoded in the image so even a tiny change made by image farbling will mess up the decoding
Is there a reason data uriโs are changed?
Also see: https://github.com/wooorm/server-components-mdx-demo/issues/2. Iโd love it if someone has a better idea ๐
hi hi! So, was wondering, could you do the following:
Would that cover the use case here?
(and as background for folks who find this later, Brave doesn't modify data uri's, we modify the results of canvas readback, which in this case is being encoded as a data uri :) )
cc @lovasoa too, that might work ๐ค
And also good to know that the canvas is the problem, not the url!
(Im closing the issue now, but lets keep discussing here if it'd be helpful)
Hey! I don't think this should be closed. This is definitely a bug in the browser. There is no privacy issue with reading a static png from a canvas.
Hello @lovasoa. You're right that there is no privacy issue from using canvas readback to read a static PNG pixels.
The issue is that there isn't a way to know whether the canvas's contents came from a static source or a dynamic source. I saw your suggestion for a taint tracking approach, but in practice taint-tracking in a dynamic system gets extremely expensive extremely quickly (especially if we need to account for persisting taint through DOM syncs, network, etc), and even then there are coverage issues in taint systems (implicit taint / flow control, etc), so its not something we could deploy and maintain against upstream blink (plus there are perf concerns, etc).
Can I ask, why not use the approach in https://github.com/brave/brave-browser/issues/14421#issuecomment-789081440? It seems like it would work, possibly even be faster, and be more "working with the grain" in the browser.
Also, as a bonus, it'd also work in Tor, Brave and with privacy-tools like Canvas Defender.
WDYT?
@wooorm @BrendanEich suggested that you were not correctly escaping when you tried the script tag method <\/script>
@bridiver because it's arbitrary data (not really json; I don't have control over it but I trust it), and scripts are complex: https://github.com/wooorm/server-components-mdx-demo/issues/2#issuecomment-788765692
The issue is that there isn't a way to know whether the canvas's contents came from a static source or a dynamic source.
This is not true. The browser knows exactly what was drawn on the canvas.
taint tracking approach, but in practice taint-tracking in a dynamic system gets extremely expensive extremely quickly
There is ALREADY taint tracking for canvases. This is a single bit that has to be set when the user either writes text or gets a 3D context on the canvas. I don't think there would be any performance impact.
- encode your JS (or arbitrary data) as a base64 data url
- include that in a script tag
- mark the tag async
The PNG is not only decoded asynchronously, it is also a compressed data representation that is decompressed asynchronously.
its not something we could deploy and maintain against upstream blink
This may be true, but this means that this is a bug you don't have the resources to fix, not that this is not a bug ๐
This is not true. The browser knows exactly what was drawn on the canvas.
The browser knows whats on the canvas of course, but the tricky part is where did the input come from. The existing "taint tracking" in the canvas is much simpler bc the input taints only need to be the immediate previous steep. The existing system doesn't attempt to comprehensively persist taint labels through DOM sinks and intermediate JS values.
This prevents the taint / label explosion, but means its not useful or comparable to what would be needed here.
Put differently, the difficult part is that fingerprinters are antagonistic, so labels would need to be comprehensive across JS and the DOM, and that is an enormously difficult problem. It would also for sure not be on the right side of the cost / benefit curve for a way of async load javascript ;)
The PNG is not only decoded asynchronously, it is also a compressed data representation that is decompressed asynchronously.
I'm very surprised by this. If you have numbers that show PNG is an efficient compression algorithm for arbitrary text, please share them (sincerely). I'm not saying its not the case, but i would be extremely surprised if it were. Almost certainly you'll be better off just using a text compression algorithm and base64'ing the results.
Either way, happy to agree to disagree on any of the above. I think it would be nice if you mentioned in your docs that your library will break sites for visitors using Tor Browser Bundle, or Bromite, or canvas fingerprinting-protecting extensions, or similar tools (in addition to Brave), especially since consumers of your lib likely wouldn't expect to be incompatible w/ such privacy tools
Either way though, thank you for taking the time to discuss. I'm going to keep the issue closed and consider the issue resolved
I think I am missing something. What does "persist taint labels through DOM sinks" mean ? Why wouldn't just tainting the canvas whenever a call to either CanvasRenderingContext2D::fillText
or HTMLCanvasElement.getContext("3d")
work ?
I'm very surprised by this. If you have numbers that show PNG is an efficient compression algorithm for arbitrary text, please share them (sincerely).
You can easily do it yourself : using gimp, open a file that compresses well as raw image data. Then save it as PNG, and compare the file size. This works because PNG uses a simple compression algorithm (deflate) directly on the pixels (after a pre-compression stage).
To show an example with the famous enwik8
$ npx bin2png /tmp/enwik8 /tmp/enwik8.png
npx: installed 2 in 2.379s
Converting /tmp/enwik8 to /tmp/enwik8.png
Success. File size difference: -11%
And comparing the base64 png to the base64 raw text, the ratio is even slightly better. Of course the ratio highly depends on how compressible the source data is.
@lovasoa I think what @pes10k is really getting at here is whether doing all of this actually results in a non-trivial difference in perceived load time. Other types of content can be compressed with Content-Encoding using a variety of algorithms so wire size isn't going to be significantly different and nothing extra is required to decompress them. At least for the original example blocking doesn't seem to be relevant because there is no content at all until the image is decompressed, but with a script tag you could use async/defer to avoid blocking. If this becomes a common method for loading content then we will probably have to rethink our stance on it, but we can't really justify the kind of work it would take to fix this right now. PRs are always welcome, but if it requires patching I think it would be unlikely to be approved at the current time. In the meantime disabling fingerprinting protection allows the page to load normally.
Description
Loading page https://wooorm.com/server-components-mdx-demo/ is scrambled
Steps to Reproduce
Actual result:
Expected result:
Reproduces how often:
Easily.
Brave version (brave://version info)
Version 1.20.110 Chromium: 88.0.4324.192 (Official Build) (64-bit)
Version/Channel Information:
Other Additional Information:
Miscellaneous Information:
Was reported via twitter; https://twitter.com/Fdecampredon/status/1366438532173225992
Due to fingerprinting, the following error in the console cc: @pes10k