Open mrchrisadams opened 5 years ago
Okay, looked into this some more over the weekend
It turns out that yes, you can filter like this, but I think my reading of the sizing was wrong. I thought it was related to the number of times you visited a site, but this code from [viz.js][1], now makes me think it's a function of the number of third parties domains a website has, so the Guardian has LOADS and appears as a big blob, but a smaller site has fewer, and seems smaller.
for (const node of this.nodes) {
const x = node.fx || node.x;
const y = node.fy || node.y;
let radius;
this.context.beginPath();
this.context.moveTo(x, y);
if (node.firstParty) {
radius = this.getRadius(node.thirdParties.length);
this.drawFirstParty(x, y, radius);
} else {
this.drawThirdParty(x, y);
}
//snip
}
As far as I can see, it's a bit of a pain to set up a testing harness with realistic data, as writing to the store involves triggering a updates in both the background thread, and the any viz or browser tabs showing the viz.
What's more, the export in getAll
, strips out some of the data that we'd want to have for an import, because we're passing the info held in indexeddb through outputWebsite
, like this:
async getAll() {
const websites = await this.db.websites.filter((website) => {
return website.isVisible || website.firstParty;
}).toArray();
const output = {};
// this strips out all the other fields we might rely on for sorting
for (const website of websites) {
output[website.hostname] = this.outputWebsite(website.hostname, website);
}
return output;
},
It makes sense to for the viz, but when we have an export function like this:
downloadData() {
const saveData = document.getElementById('save-data-button');
saveData.addEventListener('click', async () => {
const data = await storeChild.getAll();
const blob = new Blob([JSON.stringify(data, ' ', 2)], {
type: 'application/json'
});
const url = window.URL.createObjectURL(blob);
const downloading = browser.downloads.download({
url: url,
filename: 'lightbeamData.json',
conflictAction: 'uniquify'
});
await downloading
});
},
It means the data exported is in a different shape, so adding it back into a new instance with Dexie.js's import scripts, or even just by calling put
, is more of a faff.
I'm not sure how to write tests for background scripts with extensions, so I think the approach that would work better for now, at least for our purposes would be to try to export the data unchanged from indexedb, something like this:
async dumpSites() {
return await this.db.websites().toArray();
},
That would allow us to make an updated export function, and also have an import function that iterates through them to make imports possible, so we can test with more realistic datasets, and work out the best criteria. Right now I think sorting first parties by last accessed is an okay heuristic, and would mean that site you visit once, then never come back to, don't clog up the viz.
The nicer alternative in the long run would be to use the Dexie Export Import support, but that would likely involve more signficant changes to the underlying code.
If we were to build out a more advanced import/export function we might look at these wrappers which provide cross browser options:
Cross browser downloads: https://github.com/rndme/download
Import-export, as mentioned before: https://dexie.org/docs/ExportImport/dexie-export-import
This would lay the groundwork to make this available on any browser than can support the web extension spec
It occurred to me.
One side effect of sizing the sites based on usage now, is that we now have meaningful criteria for limiting the number of domains in use, to stop the greenbeam animations becoming unusably slow.
With the original lightbeam, because there was no real hierarchy among first part domains, and no real way to intuit which domains were more important, I think the decision taken was to just show everything, which made the viz unusably slow after a few days.
I'm seeing decent behaviour on a 2017 Macbook Pro, with admittedly quite beefy specs when there are 50 domains.
Steps to take
Implementing this
I think the
getAll()
function in store.js, around line 190 is what created the array that d3 uses in viz to build the force directed graph:Dexie js has a limit(), function which would let us keep the number below a certain threshold, so I think the change might be as simple as changing this:
To something like this, where we call limit() before we call toArray():