berstend / puppeteer-extra

💯 Teach puppeteer new tricks through plugins.
https://extra.community
MIT License
6.37k stars 736 forks source link

Add evasion for navigator.mediaDevices #125

Open jeroenvisser101 opened 4 years ago

jeroenvisser101 commented 4 years ago

Example of how a site (AliExpress.com) uses this to fingerprint browsers:

async function getMediaDeviceIds() {
  if (/Android/.test(navigator.userAgent) || !navigator.mediaDevices?.enumerateDevices) return null;

  const devices = await navigator.mediaDevices.enumerateDevices();
  const deviceIds = devices.map(mediaDevice => mediaDevice.deviceId);
  return deviceIds?.join(",") ?? null;
}
berstend commented 4 years ago

Interesting.

(async () => {
    const devices = await navigator.mediaDevices.enumerateDevices();
    const deviceIds = devices.map(mediaDevice => mediaDevice.deviceId);
    console.log(deviceIds)
})()

In regular Chrome this will spit out a bunch of 64 character ids. Will need to read up a bit on how they're generated so we can spoof them. :)

berstend commented 4 years ago

Mhm, are we sure this is a problem?

const puppeteer = require("puppeteer")

puppeteer
  .launch({ headless: true, args: ["--remote-debugging-port=9222"] })
  .then(async browser => {
    const page = await browser.newPage()
    await page.goto("https://example.com")

    const data = await page.evaluate(async () => {
      const devices = await navigator.mediaDevices.enumerateDevices()
      const deviceIds = devices.map(mediaDevice => mediaDevice.deviceId)
      return deviceIds
    })

    console.log(data)
    await browser.close()
  })

This will print ids as expected (also changing each run):

[
  'default',
  '1c3dcd6602054cd782d424118dec84d38f17f7984a9d7ad000850a32b05b8dd7',
  '1ef49c0f08f40e27be3c7d385617a22c72808281b975386095458b897fd898e1',
  '568e48cfcb730a3fc529d1d47b7d2ea47f8329d3873150dd349faf74df229e20',
  'default',
  'e719ef731ddf7a689b52b701b58e328234cda18409eb692e6842d919a7132949',
  '450fbd67757957e5186d27a480bf554900eb68357fcbbc502298d02c1111f5bc'
]
[
  'default',
  'c6ade38461e827c1a5823ad72498b8ef4f7bdcc63630dd232130b07eb551781f',
  '75d9e3e730bc4e5feddf636f907978bfdd6902fa63ae55fe7a1e85d8e3442d7c',
  '0b2a41e4504a05973cc17dada0c9f655a36033970506ad807fdd755d6b1df718',
  'default',
  '8f0338ee6a3f8f7471b43d94f8ae5c46207c8bfd07c409b88408fbcc23dee2e9',
  '6bc57060f1e80d6c44c80db1a3494b205c56bd3b5ca707908dd489a3f3cafb56'
]
jeroenvisser101 commented 4 years ago

I think it's not only the deviceIds itself, but the number of them, and it allowing to fingerprint the device. For instance, on the browserless/base image, the number of mediaDevices is zero, which isn't something one might want.

berstend commented 4 years ago

I just read about this:

google-chrome --use-fake-device-for-media-stream --use-fake-ui-for-media-stream

Haven't tried it myself but this might solve this issue on hosts without devices.

jeroenvisser101 commented 4 years ago

I think based on the fingerprinting logic that AliExpress uses (the one that is readable, they also have one I still am not able to deobfuscate), this would do, but according to this article it seems like they are labelled as fakes. I'll spend some more time on this later this week so might get something usable.

berstend commented 4 years ago

@jeroenvisser101 keep us posted with your findings :) Happy to improve the stealth plugin when needed.

In case we fully need to mock navigator.mediaDevices from scratch we can use navigator.plugins as reference, where we did a similar thing: https://github.com/berstend/puppeteer-extra/blob/master/packages/puppeteer-extra-plugin-stealth/evasions/navigator.plugins/index.js

TheBestMoshe commented 3 years ago

I just read about this:

google-chrome --use-fake-device-for-media-stream --use-fake-ui-for-media-stream Haven't tried it myself but this might solve this issue on hosts without devices.

but according to this article, it seems like they are labeled as fakes.

This is the media devices shown by https://amiunique.org/fp when using @berstend suggestion:

audioinput (Fake Default Audio Input)
audioinput (Fake Audio Input 1)
audioinput (Fake Audio Input 2)
videoinput (fake_device_0)
audiooutput (Fake Default Audio Output)
audiooutput (Fake Audio Output 1)
audiooutput (Fake Audio Output 2)

Contrast this to what is returned when running in my standard Chrome browser:

audiooutput
berstend commented 3 years ago

@TheBestMoshe can you try again with just --use-fake-device-for-media-stream?

berstend commented 3 years ago

Also if someone could extract the exact code amiunique.org is using here that'd be helpful and accelerate the discussion :-)

berstend commented 3 years ago

PS: An alternative to spoofing this through chrome args or JS could be to create fake devices in a docker container

Niek commented 3 years ago

It seems like a recent Chrome(ium) release removed the deviceID and label properties as returned by await navigator.mediaDevices.enumerateDevices(). Or not completely removed, but they're empty - both on my tests in macOS and Linux with a beta build of Chrome.

TheBestMoshe commented 3 years ago

@berstend When I run it without any of the fake device flags I get similar results to a regular Chrome browser.

Chrome browser:

audiooutput

Chromium with Puppeteer Stealth:

audiooutput
audiooutput
audiooutput

can you try again with just --use-fake-device-for-media-stream?

Results:

audioinput
audioinput
audioinput
videoinput
audiooutput
audiooutput
audiooutput
berstend commented 3 years ago

So we're good then? :-) We can't control the number of devices using this method but as there's fluctuation in the wild as well this shouldn't be an issue.

Niek commented 3 years ago

I see the result is different (detailed info) if you first run await navigator.mediaDevices.getUserMedia({audio: true, video: true}). But that spawns a permissions popup. Adding --use-fake-ui-for-media-stream will auto-accept the permissions hence the leaking info. So: don't use --use-fake-ui-for-media-stream, just --use-fake-device-for-media-stream or use something like pulseaudio with a dummy device as @berstend mentioned.