abrahamjuliot / creepjs

Creepy device and browser fingerprinting
MIT License
1.57k stars 195 forks source link

just opening one for my research on bot detection and stuff #190

Open vis2021t opened 2 years ago

vis2021t commented 2 years ago

I looked over the tls fingerprinting, You talked about but there is something I read at akamai research where they stated that bot are able to bypass to get on gud side :- https://www.akamai.com/blog/security/bots-tampering-with-tls-to-avoid-detection

I came across a 2 step tls fingerprinting but I lost that pdf 🥲🥲 dammit

Will try to find it but do u know about it?

abrahamjuliot commented 2 years ago

I mainly look at the Chromium source, but not as much as I should. It depends on what type of task I am facing. Recently, I was looking for documentation on why or when the WebGL renderer string stopped reporting the graphics drivers version information. I noticed a likely fake result that continued to report the version, but I'm confident that Chrome no longer includes it.

vis2021t commented 2 years ago

Chromium research

Hmm, I understood.

I will be looking on finding more core points for detections

for now

as it's kinda fun to me

vis2021t commented 2 years ago

Hey buddy, I think We should kinda increase the options at platform detection in Headless

I'm talking about aerial

I used 3 bot to check on creepjs and cloned the resulted html

all said 100 on Linux and 100 on window I am sure they were Linux headless as shared worker said it in another test upon /fpworker I mean the max I can conclude is we can be aware if its an android or pc

but don't know if it's windows or Linux because both aerial is 100

abrahamjuliot commented 2 years ago

There might be a Web API we can use to distinguish Linux from Windows. As far as I know, on Chrome, Windows typically uses Arial and Segoe UI, but this pair is not exclusive to Windows. There are a few key features that set Chrome OS, macOS, and Windows/Linux apart from each other. The hints are essentially feature detection under the hood. However, these can be easily spoofed.

We can expect these to be faked by clever scripts, and can use this as a trap to catch them. If a script attempts to emulate Android features on Desktop, it will create a better fingerprint by causing a unique window hash with an unusual re-ordering of properties. In that case, we may lose out on a useful platform hint, but we will have identified suspicious activity.

These features are subject to change, too, so we can't rely on them too heavily. In some cases, it's all right if we are not aware of the real platform. Ultimately, we just need a few unique identifiers that can tell apart unusual web traffic from normal traffic. There are many subtle fingerprints that get overlooked. CSS match media, for example, can identify devices with no mouse or touch input (keyboard-only controls).

vis2021t commented 2 years ago

Hi buddy was busy for a while will be comming back to research from tomorrow

abrahamjuliot commented 2 years ago

Found this https://nullpt.rs/author/veritas. Interesting articles.

vis2021t commented 2 years ago

really interesting i do agree

vis2021t commented 2 years ago

hmm I was using a famous plugin "Dark reader"

it add attribute in html :-

darkplugin

and yea sorry I was busy with some work I will be free now

abrahamjuliot commented 2 years ago

It's all good.

Dark Reader is great. That's a good detection, too. It can be a human indicator if it is on. Something like this, maybe.

image

vis2021t commented 2 years ago

That's a good detection, too. It can be a human indicator if it is on.

True

I use dark reader all the time was working on a website so I saw it while debugging haha, will look for more interesting plugins which may leak some things over documents etc

vis2021t commented 2 years ago

Hi, I was looking around gmail and I saw the are able to detect a secure or a suspicious browser, somewhat like we do at creepjs. But I am curious with their mechanism. I saw it after when we enter gmail address there is a detection script there. If browser is ok or not ( including bot detection ), It's always good to take inspirations haha

Wanna explore together?

abrahamjuliot commented 2 years ago

Sure, I imagine they use UA client hints to detect unseen devices and then warn backup email of unknown device log in to x account. The difficulty is de-obfuscating their code. This repo has a lot we can also look at.

vis2021t commented 2 years ago

Sure finally ur back haha , kinda missed us.

anyway I think gmail uses something more complex

even puppeteer stealth can't get in login even in normal like same useragent etc without headless written there

I think that's why I want us to see what's intresting there

when u were inactive I was learning over dev tools detection from this repo :- https://github.com/AEPKILL/devtools-detector and I tested it, it's working smooth with detections

but for now I'm really more interested in gmail detection

Because of the above reason

that's why I got interested maybe there can be something more we could learn ? who knows

vis2021t commented 2 years ago

Sure This repo has a lot we can also look at.

Damm that repo, I can sense some awesome thing right there

vis2021t commented 2 years ago

Is the obfuscators absolutely foolproof? No, while it's impossible to recover the exact original source code, someone with the time, knowledge and patience can reverse-engineer it.

Since the JavaScript runs on the browser, the browser's JavaScript engine must be able to read and interpret it, so there's no way to prevent that. And any tool that promises that is not being honest.

-- mentioned in https://obfuscator.io/#FAQ

one of the best obfuscator I have seen till yet

vis2021t commented 2 years ago

https://github.com/chris124567/commercial-bot-detectors/blob/master/files/google_botguard_deobfuscated.js

lol this is exactly what we needed

abrahamjuliot commented 2 years ago

devtools-detector

Nice. I ran into that recently. That's a good detection.

https://obfuscator.io/#FAQ

Good points. The Googlebot code looks like a challenge. I can see it collects the error stack here.

vis2021t commented 2 years ago

devtools-detector

Good points. The Googlebot code looks like a challenge. I can see it collects the error stack here.

Agreed, I am working over a small project rn which includes me to use ejs and express and a cdn of maybe vuejs, react native or any front end framework.

I literally learned all 3 ( vue, angular and react ) within 5 days. u can imagine it's been a mind blowing week for me Vuejs and React meet upto my requirements I will be completed with work day after tomorrow

will start over looking googlebot one probably day after tomorrow.

haaah ~ sigh in tiredness ~

vis2021t commented 2 years ago

Hii , I'm done with my project.

Let's research 💝

I'm gonna look at the Google botgaurd. any information u discovered? maybe?

vis2021t commented 2 years ago

I found something, I even opened a issue as research the owner is kinda active too I noticed now so

that's the latest code of Google botgaurd reverse attempt:-

https://github.com/icetroll/botguard-RE

we can learn from here

abrahamjuliot commented 2 years ago

Nice. That is a lot of code. I think it has to do with behavioral fingerprints. I see a few event listeners connected to DOM elements.

abrahamjuliot commented 2 years ago

I've been researching ways to detect Selenium and found some interesting leaks. Fascinating article here. Those values seem to be manipulated by different bots, but the object prototype contains unique keys that are important to the internal code. I haven't tested it, but I think it's possible to override those functions with eval code and use them to get internal values.

image

vis2021t commented 2 years ago

Naughty Eval
image

Very well I see now, can it also be refer as a info disclosure? If it works properly as we expect it to be, I am looking into the google bot code pattern detection (it is interesting but really nested), and also looking at the previous code challenge of google bot

abrahamjuliot commented 2 years ago

The prototype functions might only reveal Selenium code and possibly different versions of the code.

vis2021t commented 2 years ago

The prototype functions might only reveal Selenium code and possibly different versions of the code.

That too will be really interesting for creepjs. I am sure, maybe a sure bot detection haha.

Rn I am giving names to the code of g-botgaurd to like understand it's working

vis2021t commented 2 years ago

I have understood quite much about Google botgaurd, I will give u a summary properly here it is intresting ngl

vis2021t commented 2 years ago

Any update over ur research?

abrahamjuliot commented 2 years ago

Nothing yet. But, a lot on my mind. I think the storage bytes are an incredible high entropy fingerprint in Chrome. It depends on the machine and what it's used for, but if there are no changes in storage, the fingerprints can categorize a machine in 1 trillion possible fingerprints (to put it lightly). In private tabs, chromium reduces entropy (unstable per session and low bytes available).

Unrelated, I have this idea I might experiment with at some point. It's essentially a soft/superfast fingerprinting (less than 10ms and mostly low entropy), then it progressively slows down and expands into high entropy if anomalous hashes are detected. The idea is to make bad fingerprints move more slowly and good fingerprints move more quickly.

vis2021t commented 2 years ago

I looked over current gmail working, I found that they are monitoring and using the performance api very well, which I didnt knew thought of I am exploring more but I saw the new v3 is

I exploring other's antibot and monitoring behavior to expand creepjs , rn I am seeing this:-

https://developer.chrome.com/docs/extensions/mv3/intro/

vis2021t commented 2 years ago

seeing their website it's interesting how they use api's and clever javascript ( what is more interesting is that they have mentioned in their code as comment that those codes were written in the year 2016 if they are not lying it's quite fascinating ), till yet I am seeing and writing what api they are using then I will summarize things here as I go

vis2021t commented 2 years ago

I am resuming my research summarizing from today let's see I can put up some intresting points

HMaker commented 2 years ago

about chromedriver detection, check https://github.com/HMaker/HMaker.github.io/tree/master/selenium-detector

most of the tests can be easily bypassed by patching chromedriver src though.

abrahamjuliot commented 2 years ago

chromedriver detection

Very nice detection, there.. Can these functions be patched or removed? The functions names can be modified, but wouldn't the prototype still leak the names.

HMaker commented 2 years ago

You can also change the prototype completely, also you could make chromedriver store that on window instead of document.

chromedriver is just a CDP wrapper, but it sits at higher level of chromium architecture, so they use the page global JS state to store automation related vars.

vis2021t commented 2 years ago

Gmail stuff summary

They use proxy detection (mostly based on performance api ) + worker is their focus just like we have here + they do have few basic feature detection and with err detection and buckets etc and rest it's just they made them lengthy

vis2021t commented 2 years ago
        Chromedriver Detector
        Detected!

funnnnnnnnn

vis2021t commented 2 years ago

about chromedriver detection

get a hug dude lol it's a good repo great effort really loved it

vis2021t commented 2 years ago

I was thinking to challenge myself against creepjs techniques hehe