Open vis2021t opened 2 years ago
do u have something in your mind for brave, I think it's something we might be focusing on
Something unexpected I'm seeing, My default Chrome trust score has decreased with Something unusual I'm noticing,
Linux x86_64 Parrot os the voices section says it's unsupported even though few days ago it was supported, I haven't even touched chrome was working with Firefox + Fonts is indicating to be red
I even installed a fresh Chromium and tried same result , No software update has been done since 2 months
even though at below pannel everything is normal regarding Fonts, Things are off with speech
I looked in window SpeechSynthesis is there coming as boolean true
Images:-
Fonts
I refactored fonts. So, some devices will have low font score till it establishes a good trust score. It should get bumped with a higher score after a few days of traffic reporting the same fingerprint.
Fonts now use document.fonts.check
in addition to loading local fonts. This has the effect of capturing more fonts on Linux and Chrome OS, and is a bypass to Brave's new font randomization, which recently began covering local font queries.
On the fonts test page, near the top, the check & load methods typically return a distinct set of fonts. I'm not sure why that is.
Speech
Voices can be a hit or miss on some devices, but these should load on refresh. The issue might be due to the voice load event not firing within the 3000ms cut off I have set.
If that is not the issue, then it might be due to the absence of local voices on the platform. On Blink, we return blocked if no local voices engines are detected. I believe this is common on Linux devices. Need to change that to unsupported.
I see it affects the crowd score, since it computes voices as being blocked. I'm planning to remove voices from impacting the score, since the timeout situation and the absence of local voices is not necessarily blocking.
Any plans over improving Brave trust score?
But I suppose I was having 100% trust score few days ago
with also respect to voice hmm that's weird
it's probably not firing is there no better approach for speech
Hi, I found something I would like u test it on a chrome headless and tell me is there anything interesting on it
navigator.userActivation I want to see what other headless browser says
++ I also could not found anything on mdn
++ I found something intresting about workerscopes
mm so aparently mm navigator.mimeTypes is undefined in any workerscope as we expected which clear us that navigator is different but when I try for connection it is present which means mdn is incomplete with WorkerScope Navigator
Hope it sounds great to u
The following Web APIs are available to workers: Barcode Detection API, Broadcast Channel API, Cache API, Channel Messaging API, Console API, Web Crypto API (Crypto), CustomEvent, Encoding API (TextEncoder, TextDecoder, etc.), Fetch API, FileReader, FileReaderSync (only works in workers!), FormData, ImageData, IndexedDB, Network Information API, Notifications API, Performance API (including: Performance, PerformanceEntry, PerformanceMeasure, PerformanceMark, PerformanceObserver, PerformanceResourceTiming), Promise, Server-sent events, ServiceWorkerRegistration, URL API (e.g. URL), WebGL with OffscreenCanvas (enabled behind a feature preference setting gfx.offscreencanvas.enabled), WebSocket, XMLHttpRequest.
navigator.userActivation
I get true for both isActive
and hasBeenActive
in headless
workers
There's more too, here
trust score
No plans to change the scoring. Recently, I began factoring in the crowd-blending score to trust score. Getting and maintaining an A
trust score should be slightly more difficult.
Regarding Brave, I'm looking for ways to directly detect randomization or restore the values.
see I told u the docs are incomplete on mdn
looks like we are working on a state even documentation feels incomplete haha 😂
well anyway I don't have laptop rn at hospital , my health isn't good but it's boring here ngl
so sorry if I disturb u regarding testing things for me
yea hmm I will check in a custom way if speech is they are working on my laptop
trust score
yea dude it's good I think that's better
brave
hmm I will explore myself too don't worry
if a webdriver bypassed their workerscope useragent to be headless
what else can we do to get sus on it
any thoughts
It's all good. I enjoy testing and research. Hope you get well soon there.
We might be able to estimate the GPU brand on Brave based on the unprotected WebGL parameters. I would just need to begin tracking GPUs with the reduced parameters in the prediction section, and maybe call it gpu params 2. We can do this for Tor Browser standard mode, too.
Another thing I'm looking into is a human confidence score. For example, I would imagine automated browsers are not likely to have popular writing tool extensions like Grammarly or LanguageTool. So, we can fingerprint these extensions and more with little effort using CSS selectors. The existence of tools like this can increase the human confidence score. But, I can see this getting exploited or mocked. I wonder if bots will actually start installing these extensions. I would be funny to catch them in the act and then use it as a fingerprint 😁.
This is not displayed yet, but I added gpuBrand
tracking now on a handful of the prediction fingerprints
// available in the console
JSON.parse(sessionStorage.decryptionData).canvasPaintSystem.gpuBrand // INTEL for me
Hmm gpu detection like that would actually be more reliable as they would be quite hard to fake
I wonder what are the logics u use at the backend
I mean I have looked myself at the discussion about backend and even been a small part of it but I wish to know more
Headless have an problem combining with bot score
I think when headless is detected hmm it should increase the bot score to more likely a bot which is not happening as I tested on Google mobile friendly bot and yea workerscope is detecting
but I think we can explore more on worker section as there are things there which might be intresting
logics on the back end
The server gives the canvas fingerprint a data profile that contains 3 GPU related arrays. This logic is used for systems and devices too.
gpuBrands: [
"INTEL"
],
gpus: [
"INTEL:ANGLE (Intel(R) UHD Graphics Direct3D11 vs_5_0 ps_5_0)",
"INTEL:ANGLE (Intel, Intel(R) UHD Graphics 620 Direct3D11 vs_5_0 ps_5_0, D3D11)"
],
gpuWatch: [
"INTEL:460191600000:8/2/1984:703722......:18"
],
gpus
holds the last 10 unique GPU strings, and each string is reformatted to hold its brand at the beginning.gpuBrands
holds all the brand's seen with the canvas fingerprintRestrictions are in place before a GPU is accepted in the above arrays. It must have no damage or signs of JavaScript tampering, it must have a moderate or high confidence score (with known parts), and if WebGL in worker is available, the GPU strings must match.
Deceptions can still occur through engine level tampering, so we have this final gpuWatch
array that tracks each reported brand and makes the brand self-destruct from all 3 arrays it if it fails to maintain trust.
To establish trust, it just needs to get satisfy client side checks and show up on the server as a valid GPU string. To maintain trust, the brand needs to just not self-destruct under these conditions:
If the brand only has
Self-destruction on that brand is triggered by any counter brand whenever one shows up on the server.
// gpuWatch (the watcher)
[BRAND]:[timeLastSeen]:[dateStringOfLastSeen]:[hashOfDistinctTimezone][...][...][...]:[brandSeenTotal]
This design aims to auto distrust reporting if it is not supported by current web traffic.
headless is detected should increase the bot score
I like this idea. I plan to change the bot pattern to include more headless signals. Right now, everyone gets the stranger bot level on the first visit, and then on the second visit bot patterns are computed, but we can start differentiating stranger from headless on the first visit and boost the score.
gpu detailed explanation
hi buddy thanks for sharing details over what happens, I will be back from hospital today
bot patterns needs a lot of change I think.
Hi Buddy, Something I want u to look at for a tiny bit:-
I feel a bit confused
Hmmm... how about this demo? Do any voices load?
https://mdn.github.io/dom-examples/web-speech-api/speak-easy-synthesis/
Missing voices on Linux might be related to this Chromium issue.
https://bugs.chromium.org/p/chromium/issues/detail?id=586819
Sure I will see and let u know,
Hmm I was wondering how are those samples data created and updated?
Ofcourse they hold an important role
++ I also am curious about gpu sign near domrect etc
at crowd blending section
I saw there are Many places where there is a gpu sign (I guess thats what they meant ), and I am curious because I wasn't aware they can be used for it
Domrect , Device of timezone grabbed my attention
Hope u can explain me something on that
I found resource regarding guessing random number with high accuracy, does it grab some intrest of our
I mean if someone is using something for generating an enitre random number to bypass detection in some way
https://v8.dev/blog/math-random It can be very useful as I also found a trusted yt info on it
https://www.youtube.com/watch?v=-h_rj2-HP2E
Please take a look on his self research
I think If we use this and Well I am unaware of how plugins work which provide privacy etc or bot detection bypass works for companies with google_bot but till yet I have seen everyone uses random at someplace for output, Makes it even more assured that this is something big,
I think we should rawly stick with the web engine direct code understanding It will give us info more than any docs can
I was doing it on v8 and came across random number implementation So I assume my approach is quite simple yet the on of the fastest one
I suppose if anyone creates a random generated value when we call a specific thingy
...It might be good place to easily decrease trust score
I will research on getting previous generated random number by reversing the random number implementation If they generate random value previously before we called it
I think it's something really interesting which should caught up to our eye
Domrect , Device of timezone grabbed my attention how are those samples data created and updated?
only 2 tiny question from a kid hehe Hope u could tell me something on it
hope u found my research on random number detection somewhat useful
I hope u could take out time for me.....
I'm terribly slow, but I've been looking forward to this. These are great questions.
gpu sign (domrect, device of time zone, etc)
As far as I know, DOM rect and other pixel precision fingerprints (font pixels, SVG, and text metrics) are not actually affected by graphics hardware, but I might be wrong about that. We collect the GPU brand anyhow to see what we can find. The pixel rendering uses CSS transforms, but this only impacts the frame-rate and not the precision of the dimensions.
It's possible that the prediction is accurate due to the low GPU count in the fingerprint or the hardware signature. For example, the DOM rect can be very different on certain virtual machines with unusual display resolutions (VMWare, VirtualBox, etc.). The GPU brand does not impact the pixels, but this fingerprint only includes GPUs from a single brand. A more clear example, is WebKit pixel rendering on iOS will only have the Apple GPU brand.
It will be interesting to see if we can accurately predict NVIDIA, AMD, or INTEL in pixel precision rendering after a week or so of gathering fingerprints. I can check the logs and see what good crowd score fingerprints are reporting.
how are those samples data created and updated?
It's mostly through Google Firebase/Firestore. When the page initially loads, an encrypted request is sent to the API containing the prediction fingerprints. The request is deciphered on the server before returning the results. Subsequent page loads will only resend the post if the fingerprints change. To ensure the accuracy of the samples, I set up some restrictions: certain forms of bad bot behavior (as identified by the bot hash) are not allowed to participate. Finally, I use a Google Apps Script server to automatically request the full JSON dump we have here on GitHub.
I currently am manually importing it from Apps Script to GitHub every 2 weeks or so, whenever browser features need an update. It would be great to find a way to automate this step. The file is 2 megabytes, which would make an API request from the client too costly.
to be continued...
I found that Google bot have abnormal hardware concurrency
which is between 112 to 128 for the average as far as I have seen which means that for Android,
ofcourse worker found out it is a headless chrome Linux
but this Part also interesting as it states at the beginning that an android with respect to its model which android user agent and other places have declared have abnormal specs, I think we can make a chart
with respect to js engine and what they declare etc , there are few places it can be usefull
I will test it today and let u know
and to update to client with less performance hamper
hmm have to think
the random number prediction method works only on chrome and nodejs
The method is different for spider-monkey
we could make an self feeding smt solver till a limit and we use that to determine random number but my curiosity goes to if we could go ahead I think we might go back
like first we need to understand how the script is working then we can work with reverse and seeing that did u got any similar pattern anywhere?
on future prediction or past prediction
guessing random number with high accuracy
This is incredible. Watched the video too. I wonder if we could use JavaScript or WASM to predict Math.random. The difficulty is determining what to target. I can think of some counter-attacks that could handle randomization using Web Crypto, WASM, or by changing the engine (like this browser-- it's advanced but has a handful of leaks).
I was thinking if a headless browser device which says and pretend to have gyro and is using their values some fixed values and the end values with math.random which is a pretty trick i have seen but we can detect much more things using math.random with it for catching lies
it can be a whole another level this can be a whole another level thing
Check this out :- https://github.com/chromium/chromium
https://incolumitas.com/2021/01/10/browser-based-port-scanning/
Why don't we test it out and see what bots do
I saw puppeteer extra stealth plug-in have somewhat control over service worker or shared worker
one of them
so as bidi new browser automation method I think it might be interesting to check for ports
self localhost port scanning
Check this out :- https://github.com/chromium/chromium
Nice. I sometimes use these...
https://source.chromium.org/ https://searchfox.org/
https://chromestatus.com/roadmap https://developer.mozilla.org/en-US/docs/Mozilla/Firefox/Releases
Nice. I sometimes use these
I will be looking at the headless and normal one really deeply including v8 engine source
I feel like I am becoming the browser 😂😂 lol
Had any thoughts over random number predictions?
I think we can use it for many things but I wanna know if u have anything in your mind
Hmmm... thinking on this, but nothing yet. I'm guessing we would need a target script. For example, if the script is getting a random number within a random range, we would need to know that both the range and the final number need to be decodes.
Hmmm... thinking on this, but nothing yet. I'm guessing we would need a target script. For example, if the script is getting a random number within a random range, we would need to know that both the range and the final number need to be decodes.
I think we should checkout privacy extensions and how they using random
Here are a few examples
https://github.com/kkoooqq/fakebrowser/blob/main/src/core/DeviceDescriptor.ts#L463 https://github.com/unblocked-web/unblocked/blob/main/plugins/default-browser-emulator/lib/Viewports.ts#L69 https://github.com/duckduckgo/content-scope-scripts/blob/main/build/chrome/inject.js#L19 https://github.com/jake-cryptic/AbsoluteDoubleTrace/blob/master/MyTrace/js/contentscript/page.js#L94
We use these to generate random traps in canvas and audio
// color
~~(Math.random() * 256)
// frequency
getRandFromRange = (min, max) => Math.floor(Math.random() * (max - min + 1)) + min
max = 20
start = getRandFromRange(275, length - (max + 1)); // random index between 275 and 1979
Used for encryption in post requests
crypto.getRandomValues(new Uint8Array(12))
Hmm, I see math.random is quite much used
If we could carefully understand how they are trying to keeping random not looking like random
and use our method to see in pattern match
we it can be interesting,
lol who knew math.random can be an issue 😂😂
I am doing self-bugs finding maybe will find a bug, I am a browser now
(beep beep - boop boop) sandbox
What are u doing these days?
Right now, I'm working on a new analysis API. The current analysis response is displayed in the console, but the goal is to show this in a section and tag suspicious traffic.
new analysis API.
I was thinking that we can like the way u are doing for gpu
we can also do for smartphones , u won't imagine a phone with 127 hardware concurrency and stating to be an android Pixel with shiftshader lol
I mean I think even when it comes to things
The stats of specs js gives us won't be higher than what is stated on the device model u know, it won't go higher from its original specs
if they don't have model that's a different condition but if they do
this can be a simple technique 👌
smartphones
I like this idea. iPhone would be easy to verify. We just need to validate the engine is WebKit. Android on Firefox and Chrome has dozens of features not on desktop. We could run a mobile test based on this.
My buddie llked my idea
Hehe true we had a first glance at this thingy with navigator.connection.type
declaring that there are many things which are not available at other places declaring that well Javascript is different at other places
I have not tested one thing which I wish to test , in chrome
there was a vulnerability of code execution but as browser are sandbox based
I was thinking not going out of sandbox but is it possible to access resources of chrome:// url? which is by default restrict for js
I mean that url is really really great for many things it hold a lot of info
I don't think so. Pretty sure those are locked down, but maybe there's a bug to get around it.
I don't think so. Pretty sure those are locked down, but maybe there's a bug to get around it.
True will look in it, Specially the version chrome://version
have a path information, it's like example
data/user/0/com.kiwibrowser.browser/app_chrome/Default
I think these places can also be useful if we could sum up bugs around it for that
that's why I'm understanding V8 js engine etc
pretty sure the many things under chrome:// can't be faked
Was this your choice or were u inspired by somewhere :-
😆😆
Lol. It's an arbitrary selection based on unique patterns here.
I need to create a better test page and render by platform font and then create sections by versions. I believe the older versions have much faster rendering performance.
https://emojipedia.org/emoji-versions/ and there's some overlap at https://emojipedia.org/unicode-versions/
lmao can't believe we are gonna cross reach comments
I cloned the website which was rendered of googlebot and almost every single icon in crowdblend was detected as linux
except the screen it showed to be android
I think bots only hide basic useragent and screen etc Their even dedicated worker useragent had headless in it lol
Do u have seen any smart bots in your server side reviews
or maybe something like u felt it could be a bot ?
I see 3 types of traffic on CreepJS
I'm not certain any of these are automated until I examine the request timing and delay pattern. For example, there are spikes that generate ~500 request in less than 10 minutes (with less than organic delay) and produce 100s of DOMRect
fingerprints but only 1 SVGRect
fingerprint. SVGRect
uses the same DOMRect
interface and should produce the same amount of fingerprints. CSS pixels should also yield a similar distinct count.
Other spikes can look natural and produce a reasonable 500 requests in under an hour, but then it contains 300 GPU strings and the timing looks more like a hit-and-run operation. Perhaps the developer is putting in some exercise tests on the crowd blend API. I'm often able to single it out as a unique fingerprint by just looking at the stack value, which is not easy to fake.
Types of Traffic Creepjs gets
I understood,
I am curious what do u use browser sources for?
for example chromium source etc?
I looked over the tls fingerprinting, You talked about but there is something I read at akamai research where they stated that bot are able to bypass to get on gud side :- https://www.akamai.com/blog/security/bots-tampering-with-tls-to-avoid-detection
I came across a 2 step tls fingerprinting but I lost that pdf 🥲🥲 dammit
Will try to find it but do u know about it?