arkenfox / TZP

TorZillaPrint: Firefox & Tor Browser fingerprint testing
https://arkenfox.github.io/TZP
MIT License
193 stars 28 forks source link

font stuff #150

Open Thorin-Oakenpants opened 2 years ago

Thorin-Oakenpants commented 2 years ago

is it possible to make the font sizes in the the unicode glyphs test, zoom resistant?

This has been on my mind. There is a method I use to harden unicode metrics (emojis & glyphs). Instead of using the text pixel sizes as the metric, I look for the first occurrences of unique dimensions and then the final set is the metric. I also reduce the dimensions in the set to a sum. Then, if we change the screen device pixel ratio or font size, this unique set (but not the sum) should remain the same.

const unicodeSet = new Set()

// loop through the results... and then add new metrics to the set
const metricAsString = "200 x 300"// example
if (!unicodeSet.has(metricAsString)) {
  unicodeSet.add(metricAsString)
}

const uniqueList = [...unicodeSet] // cast set to array
const sum = uniqueList.reduce((acc, x) => acc += x.split(' x ').reduce((acc, x) => acc += Number(x), 0), 0)

Originally posted by @abrahamjuliot in https://github.com/arkenfox/TZP/issues/49#issuecomment-1024618900

Thorin-Oakenpants commented 2 years ago

I also reduce the dimensions in the set to a sum

I'm not quite following what is being summed

I do not like the sound of this, seems like collisions would happen - e.g. 100 x 200 = 200 x 100 = 20 x 1000 - probably not likely given a small sample size. Tell me more

Unique set: not 100% sure on exactly how system scaling, dpi and devicePixelRatio (subpixels) may affect this, but I think this would lead to less entropy, because the same font users (think platform and limited font visibility such as RFP or Brave) are going to returning the same relative measurements, and same sets of unique measurement won't change (i.e if A, B and C are the same size, that's not going to change for like platforms and fonts)

Maybe I'm missing something but to me you would return glyph sets for both as A, B and lose entropy

abrahamjuliot commented 2 years ago

summed/collision

I collect each unique set of width x height, then sum them all together. Yeah, collision can happen. A hash of the collection is best.

system scaling/entropy

I'm re-evaluating this, and I don't think it works beyond monospaced fonts. That's true, the unique set of code points' has less entropy than the unique dimension. We could use both, but it may not be worth it.

I recently stopped using the monospace font and the former hardened set now runs wild. I'm instead using a font-family short list. It defaults to a platform or OS release font if available. The entropy is much higher with far less collision between OS releases. The list is not ideal for testing all code points, though.

Thorin-Oakenpants commented 2 years ago

I'm going to be adding in code points per size - per script and globally .. to my fontscript page (which doubles as a code point generator and fingerprint data collector for analysis)

It's a bit of a hack to sort the width x height buckets .. sigh