berstend / puppeteer-extra

💯 Teach puppeteer new tricks through plugins.
https://extra.community
MIT License
6.46k stars 741 forks source link

Web Workers are leaking the true navigator.platform #451

Open NikolaiT opened 3 years ago

NikolaiT commented 3 years ago

I think you use by default Win32 as navigator.platform. Your plugin also allows to override it according to the User Agent I think.

But regardless how you populate navigator.platform, it seems to stay

"platform": "Linux x86_64",

when using Web Workers. Luminati.io data collectors also affected.

I saw this issue when testing with creepJS.

For a quick check, visit with stealh puppeteer: https://abrahamjuliot.github.io/creepjs/tests/workers.html Edit: True platform is also leaked in iframes: https://abrahamjuliot.github.io/creepjs/tests/iframes.html

Unfortunately, I don't know how to fix it.

Quick PoC:

// webworker.js
var workerData = {
  platform: navigator.platform,
}

postMessage(JSON.stringify(workerData, null, 2));

And your index.html

<div>
  <pre id="webWorkerRes">
  </pre>
</div>

<script>
  var w;

  if (typeof(Worker) !== "undefined") {
    if (typeof(w) == "undefined") {
      w = new Worker("webworker.js");
      document.getElementById("webWorkerRes").innerHTML = 'started...';
    }
    w.onmessage = function(event) {
      document.getElementById("webWorkerRes").innerHTML = event.data;
    };
  } else {
    document.getElementById("webWorkerRes").innerHTML = "Sorry! No Web Worker support.";
  }
</script>
berstend commented 3 years ago

We have an internal fix for handling workers (all 3 types: service-/web-/dedicated workers) for puppeteer. Haven't found a way yet to surface all necessary events in playwright (their change to abstract CDP communication away with their own wire protocol makes this harder).

I didn't have time to clean this up and add it to the public stealth code, so it's good to have a canonical issue for that matter as reference. :-)

berstend commented 3 years ago

iframes: There's a timing/race condition in the CDP protocol affecting certain types of iframes which we've found a workaround for (also unreleased currently)

kingkhan1431 commented 3 years ago

any update on this issue @berstend

shtefcs commented 3 years ago

Wondering the same. Seems the Creep.js owning our Puppeteer bots :D.

@berstend, any plan to deal with Creep.js detection/fingerprinting?

berstend commented 3 years ago

Haven't had time to add this to the open-source repo yet :-)

Please note that creepjs is a specialized testing site and anti-bots using workers are rarely seen in the wild.

fusillijerry89 commented 3 years ago

My discord scrapers have been getting disabled lately. Could this possibly be the reason?

berstend commented 3 years ago

Workers can be seen in the Network and Application tab of the devtools:

image

shtefcs commented 3 years ago

Haven't had time to add this to the open-source repo yet :-)

Please note that creepjs is a specialized testing site and anti-bots using workers are rarely seen in the wild.

Well, you never know what Peris....X and other guys have in their closed source defense :D.

When is the plan, to add this to the public repo?

berstend commented 3 years ago

There's no closed source defense here, as I explained earlier the presence of workers can be verified easily as they run in the browser.

This is an open-source project, so everyone is welcome to add worker support themselves or even create a PR here to share that with others (wouldn't hold my breath for that though).

If one is unable/unwilling to do so then they need to wait until I find the time in my busy schedule to add this to the open-source repo. There's no ETA as I'm doing this in my free time for fun.

If worker support is business critical my profile has contact info and information about my hourly rate.