EFForg / badger-sett

Automated training for Privacy Badger. Badger Sett automates browsers to visit websites to produce fresh Privacy Badger tracker data.
https://www.eff.org/badger-pretraining
MIT License
120 stars 15 forks source link

could some additional settings improve resource usage during training? #62

Closed jawz101 closed 3 years ago

jawz101 commented 3 years ago

Since we're just trying to observe certain types of info per domain, could any of these settings improve the performance during training or reduce the resources needed to move through the training without compromising what domains are caught?

some permissions.default.image = 2 --disable images browser.display.use_document_fonts = 0 --disable fonts gfx.downloadable_fonts.enabled = false --disable fonts dom.serviceWorkers.enabled = false -- disable creation of service workers media.peerconnection.enabled = false -- disable webrtc security.OCSP.enabled = 0 --disable querying csp for every cert check webgl.disabled = true --disable webgl layout.spellcheckDefault = 0 --disable spellchecker network.dns.disablePrefetch = true --disable dns prefetching since we aren't clicking on everything during the scan

curious: network.cookie.maxPerHost = 32 dom.workers.maxPerDomain = 48 network.http.max-connections = 48 javascript.options.mem.max = 16384 network.websocket.max-connections = 20

another question increase network.cookie.maxNumber ?

misc. anything with the words network.*max or javascript.options that may make it hang on a page longer than it needs to catch what needs to be caught

disable some things related to media or the max size of things / max retries or timeouts of things. I dunno

ghostwords commented 3 years ago

Hi, thanks for the suggestions!

As you wrote, we don't want to disable anything that would make our browser significantly different from a typical browser. For example, disabling service workers is a no-go probably, since popular websites rely on or even entirely depend on service workers.

We ran into significant slowdowns after updating to Chrome 85, which seem to have gotten addressed by disabling hardware accelerated rendering.

Do you know if any of the above tweaks should produce non-trivial speedups?

jawz101 commented 3 years ago

um... I just had a thought. If I'm reading it correctly, It looks like Mozilla's little Selenium privacy testing benchmark thingy, OpenWPM, does disable some prefs. I think it is similarly-oriented automated web crawler tool they've made for privacy research.

https://github.com/mozilla/OpenWPM/blob/master/openwpm/deploy_browsers/configure_firefox.py

It looks like a lot of things they turn off deal with disabling phoning back to Mozilla (auto updates, search boxes, telemetry, crash reporting, etc.) and then some of the prefetching and querying ocsp responders:

    # Predictive Actions / Prefetch
    prefs["network.predictor.enabled"] = False
    prefs["network.dns.disablePrefetch"] = True
    prefs["network.prefetch-next"] = False
    prefs["browser.search.suggest.enabled"] = False
    prefs["network.http.speculative-parallel-limit"] = 0
    prefs["keyword.enabled"] = False  # location bar using search
    prefs["browser.urlbar.userMadeSearchSuggestionsChoice"] = True
    prefs["browser.casting.enabled"] = False
ghostwords commented 3 years ago

I think a lot of those Firefox overrides are specific to OpenWPM.

The gist I think is that OpenWPM uses a proxy to monitor traffic and defaults to stateless crawling (browser restarts for every website). So most of the overrides are there to remove the unwanted Mozilla network traffic on Firefox startup.

Badger Sett doesn't use a proxy to capture traffic at this time and isn't meant to restart the browser much.

jawz101 commented 3 years ago

thanks. Well, I'll close this for now