eveahe / greenbeam

Forked from Firefox Lightbeam, with added info on which sites are running on renewables!
Mozilla Public License 2.0
6 stars 0 forks source link

Limit the number of first party domains, based on the amount of use #19

Open mrchrisadams opened 5 years ago

mrchrisadams commented 5 years ago

It occurred to me.

One side effect of sizing the sites based on usage now, is that we now have meaningful criteria for limiting the number of domains in use, to stop the greenbeam animations becoming unusably slow.

With the original lightbeam, because there was no real hierarchy among first part domains, and no real way to intuit which domains were more important, I think the decision taken was to just show everything, which made the viz unusably slow after a few days.

I'm seeing decent behaviour on a 2017 Macbook Pro, with admittedly quite beefy specs when there are 50 domains.

Steps to take

Implementing this

I think the getAll() function in store.js, around line 190 is what created the array that d3 uses in viz to build the force directed graph:

 async getAll() {
    const websites = await this.db.websites.filter((website) => {
      return website.isVisible || website.firstParty;
    }).toArray();
    const output = {};
    for (const website of websites) {
      output[website.hostname] = this.outputWebsite(website.hostname, website);
    }
    return output;
  },

Dexie js has a limit(), function which would let us keep the number below a certain threshold, so I think the change might be as simple as changing this:

const websites = await this.db.websites.filter((website) => {
  return website.isVisible || website.firstParty;
}).toArray();

To something like this, where we call limit() before we call toArray():

// Me guessing
const MAX_NODES=100

const websites = await this.db.websites.filter((website) => {
  return website.isVisible || website.firstParty;
}).limit(MAX_NODES).toArray();
mrchrisadams commented 5 years ago

Okay, looked into this some more over the weekend

It turns out that yes, you can filter like this, but I think my reading of the sizing was wrong. I thought it was related to the number of times you visited a site, but this code from [viz.js][1], now makes me think it's a function of the number of third parties domains a website has, so the Guardian has LOADS and appears as a big blob, but a smaller site has fewer, and seems smaller.

for (const node of this.nodes) {
      const x = node.fx || node.x;
      const y = node.fy || node.y;
      let radius;

      this.context.beginPath();
      this.context.moveTo(x, y);

      if (node.firstParty) {
        radius = this.getRadius(node.thirdParties.length);
        this.drawFirstParty(x, y, radius);
      } else {
        this.drawThirdParty(x, y);
      }
      //snip 

}

A sensible criteria to sort by

As far as I can see, it's a bit of a pain to set up a testing harness with realistic data, as writing to the store involves triggering a updates in both the background thread, and the any viz or browser tabs showing the viz.

What's more, the export in getAll, strips out some of the data that we'd want to have for an import, because we're passing the info held in indexeddb through outputWebsite, like this:

async getAll() {
    const websites = await this.db.websites.filter((website) => {
      return website.isVisible || website.firstParty;
    }).toArray();
    const output = {};

    // this strips out all the other fields we might rely on for sorting
    for (const website of websites) {
      output[website.hostname] = this.outputWebsite(website.hostname, website);
    }
    return output;
  },

It makes sense to for the viz, but when we have an export function like this:

downloadData() {
    const saveData = document.getElementById('save-data-button');
    saveData.addEventListener('click', async () => {
      const data = await storeChild.getAll();
      const blob = new Blob([JSON.stringify(data, ' ', 2)], {
        type: 'application/json'
      });
      const url = window.URL.createObjectURL(blob);
      const downloading = browser.downloads.download({
        url: url,
        filename: 'lightbeamData.json',
        conflictAction: 'uniquify'
      });
      await downloading
    });
  },

It means the data exported is in a different shape, so adding it back into a new instance with Dexie.js's import scripts, or even just by calling put, is more of a faff.

I'm not sure how to write tests for background scripts with extensions, so I think the approach that would work better for now, at least for our purposes would be to try to export the data unchanged from indexedb, something like this:

  async dumpSites() {
    return await this.db.websites().toArray();
  },

That would allow us to make an updated export function, and also have an import function that iterates through them to make imports possible, so we can test with more realistic datasets, and work out the best criteria. Right now I think sorting first parties by last accessed is an okay heuristic, and would mean that site you visit once, then never come back to, don't clog up the viz.

The nicer alternative in the long run would be to use the Dexie Export Import support, but that would likely involve more signficant changes to the underlying code.

https://dexie.org/docs/ExportImport/dexie-export-import

mrchrisadams commented 5 years ago

If we were to build out a more advanced import/export function we might look at these wrappers which provide cross browser options:

Cross browser downloads: https://github.com/rndme/download

Import-export, as mentioned before: https://dexie.org/docs/ExportImport/dexie-export-import

This would lay the groundwork to make this available on any browser than can support the web extension spec