WorldBrain / Memex

Browser extension to curate, annotate, and discuss the most valuable content and ideas on the web. As individuals, teams and communities.
https://worldbrain.io
4.43k stars 339 forks source link

Add in passive data deletion to existing-user cloud onboarding #1196

Closed poltak closed 3 years ago

poltak commented 3 years ago

Included:

Not included, but part of the specs:

ShishKabab commented 3 years ago

Looks good so far! A few questions:

poltak commented 3 years ago

Will there be any problems with anyOf() or noneOf() with a lot of operands? What happens if there are 9000 pages/visits/favIcons to delete?

Had a bit of a search around and found a few instances of people talking about using anyOf with thousands of operands, without mentioning any problems apart from discussions on performance:

And a quick glance over the implementation shows it doesn't seem to be doing anything particularly fancy with the input array so, assuming the JS engines can handle really big arrays fine, I think we're good. https://github.com/dfahlander/Dexie.js/blob/master/src/classes/where-clause/where-clause.ts#L164

Since orphanedUrls is only used to delete stuff, does it make sense to skip calculating that and just deleting pages and visits using noneOf()? Could that speed up the process and would it even give a noticeable perf boost?

That's a great pickup, and I updated the code to do it and the test is still passing. However, digging a bit deeper I found noneOf has an entirely different implementation to anyOf, using an IDBCursor while anyOf is its own thing entirely - separate to use of IDB APIs (outlined in this article I read ages back: https://www.codeproject.com/Articles/744986/How-to-do-some-magic-with-indexedDB). I really don't know how that might handle larger inputs, though anyOf seems to be fine and there are possible perf issues with using the IDBCursor methods (https://github.com/dfahlander/Dexie.js/issues/1116). So, as things should work ok with anyOf, as we've seen with Oli running the script this logic is based off on his personal dataset and the answer to the prev question, I'm reluctant to change this. Let me know if you disagree

Unrelated: in looking into that, I realized my test wasn't taking into account visits, which it should have. Updated to do that.

What happens when part of the process fails for whatever reason? As far as I can see the whole process seems to be idempotent, but how are errors handled in the UI?

It is caught in the UI logic and we have a specific screen for the error state which doesn't let the user continue further on with the migration process. Though currently it only says "There was an error" and refers to user to contact support via email, and also affords retrying the passive data wipe or cancelling to go back.

ShishKabab commented 3 years ago

So, as things should work ok with anyOf, as we've seen with Oli running the script this logic is based off on his personal dataset and the answer to the prev question, I'm reluctant to change this. Let me know if you disagree

Yeah, it seems to work pretty quickly on quite a large data set, so let's keep it as it is.

About idempotency, I was just wondering what happens if the browser is shut down between deleting pages and visits, and deleting fav icons. If the process is ran again, it still wouldn't clean up the fav icons. I think that deleting the pages as the very last thing would be the safest here.

poltak commented 3 years ago

Great point. Though looking at the logic, due to limitations with how IDB works, lacking ability to project arbitrary fields, there's no way to get the fav-icon delete happening first without significantly changing the space complexity of the logic. We'd have to bring in all the orphaned pages into mem, then filter out the hostname fields. That might be fine in most cases, but maybe pushing it if there are thousands of orphaned pages.

Instead I'm just going to put this logic in a transaction - I don't know why I didn't think about that earlier. Been a while since needing to use transactions