benjaffe / chrome-okc-plugin

OkCupid Poly Plugin
MIT License
62 stars 32 forks source link

Use chrome.storage instead of localStorage #11

Open cowboy opened 10 years ago

cowboy commented 10 years ago

Because then data can be more persistent and synced across browsers!

http://developer.chrome.com/extensions/storage.html

benjaffe commented 10 years ago

Oooh, I can use it now! I got localStorage syncing between windows on the same computer and triggering changes in each other... I was proud of that! But yeah, many reasons to use chrome.storage.

Only downside is that it won't work in FF if I or someone ports it.

cowboy commented 10 years ago

Fallback to localStorage for older browsers.

benjaffe commented 10 years ago

It's such a mess right now, but if I get enough time to refactor it to work cleanly, that's a good route. I wish I'd realized it'd become a real project when I started hacking away a year and a half ago!

vectorjohn commented 10 years ago

I started work on this. My only concern is it's a lot of changes (as you've noted) because there are JSON.parse()/stringify() and misc localStorage calls or _OKCP.storage() calls all over. To make things more fun, chrome storage is asynchronous.

Also, Chrome's remote storage (for cross browser syncing) has a somewhat low storage limit. So some care will need to be taken to reduce that. Maybe only some things will be able to sync. Most settings should be fine.

But anyway, I got as far as a version that runs without errors and uses Chrome storage or a fallback. I abstracted the storage a little so you use the same interface for all storage calls. I'm just not done making sure it still works like it used to.

Edit Also, I realize you might not want all the changes. I mostly did it for fun and got carried away. It was way more changes than I was expecting.

AstraLuma commented 10 years ago

The most important thing to sync, though, is the profiles, which I suspect take up most of the storage.

Are there Google web APIs for user-tied storage? I've found the low limits of chrome.storage to be absurd.

@vectorjohn I think moving everything to a single async storage layer is a good way to go. Also, moving the actual storage to the event page and using message passing I think would be a good idea, so storage is tied to the extension instead of to the website. I was thinking of tackling this issue anyway.

benjaffe commented 10 years ago

@vectorjohn That is really cool! I really like it! I'm pretty crazy busy right now, but I'd love to find a way to incorporate the changes.

@astronouth7303 I love tying the storage to the event page rather than the site itself... then BetterCupid can't run localStorage.clear on us, nuking everything...

We'd need to make a nice easy-to-use method for debugging. Right now there's a input (that shows up at the bottom if you have debug mode enabled) that lets you run code in the same memory space as the plugin. That'd work, but it doesn't feel very natural to debug that way.

@astronouth7303 I think you can request unlimited storage for chrome.storage (which I think I did in the manifest already in preparation for potentially going this direction). I think it's chrome.storage.sync that has a lot of the functionality we want (syncing!), and also the silly storage limits. If we made the keys really short, we could still fit a lot in there though.

AstraLuma commented 10 years ago

@benjaffe Sadly, unlimitedStorage does not apply to chrome.storage. See the docs for details.

Let me see if I can do a storage API. Then there's a standard way to handle multiple back-ends, migrations, and other complexities without having to change all the other APIs.

benjaffe commented 10 years ago

Oops, you're totally right -- missed that.

Be sure to check out @vectorjohn 's work on this.

AstraLuma commented 10 years ago

Still doesn't fix problems with async that's going to be required to move forward, but having a reference for what to touch is useful.

I've started an API for settings and profile data based on Promises. Should be able to completely decouple plugin code from the underlying storage, whether it's event page, localStorage, chrome.storage or some other method.

@benjaffe I have a pile of storage related questions for you:

  1. is okcpDefaultQuestions really necessary to keep in localStorage? Requests to plugin files are basically free, so in-memory should be ok, right?
  2. okcpRecentProfiles looks like a good candidate for IndexedDB. Consistent schema with queries and reporting.
  3. In an ideal world, is there anything that would be stored persistently but unsynced?
vectorjohn commented 10 years ago

I'm not sure what you mean about not handling async. The changes I made allow you to treat storage as if it is synchronous, so app code didn't have to change except for a couple places that depended on storage right away and those were minor changes.

On Tue, Jul 1, 2014 at 3:23 PM, Jamie Bliss notifications@github.com wrote:

Still doesn't fix problems with async that's going to be required to move forward, but having a reference for what to touch is useful.

I've started an API https://github.com/astronouth7303/chrome-okc-plugin/blob/storage-api/plugin/storage.js for settings and profile data based on Promises http://www.html5rocks.com/en/tutorials/es6/promises/. Should be able to completely decouple plugin code from the underlying storage, whether it's event page, localStorage, chrome.storage or some other method.

@benjaffe https://github.com/benjaffe I have a pile of storage related questions for you:

  1. is okcpDefaultQuestions really necessary to keep in localStorage? Requests to plugin files are basically free, so in-memory should be ok, right?
  2. okcpRecentProfiles looks like a good candidate for IndexedDB. Consistent schema with queries and reporting.
  3. In an ideal world, is there anything that would be stored persistently but unsynced?

— Reply to this email directly or view it on GitHub https://github.com/benjaffe/chrome-okc-plugin/issues/11#issuecomment-47716565 .

AstraLuma commented 10 years ago

@vectorjohn It still depends on the old data schema which is ... odd at best. Loading the entire profile database into memory every time is just inefficient. (Although what you do is better than what the current code does...)

Forcing promises through the API (restructuring code that I believe @benjaffe has already said needs help) gives the storage layer more options with regards to lazy loading, caching, collating, and other management. (I also have a plan in place for data change events.) Replacing JSON schema with more API calls means that there's flexibility to change more of the underlying data storage without having to code everywhere: Only the schemas of profile objects and individual settings are used in the plugin, everything else is internal to storage.

(This discussion, of course, only deals in the syncronizable data and doesn't begin to touch the profile-question database.)

In any case, it's up to @benjaffe which proposal to go with moving forward, it's his project. In the mean time, I plan on continuing my work of finishing the API, migrating the rest of the plugin to it, and then moving data to the event page.

vectorjohn commented 10 years ago

I see. I didn't think there was more than a few K or maybe a few MB of data stored, so loading it into memory makes the most sense. If it's more than that, it makes sense to do something else. It's just a much bigger change :)

On Tue, Jul 1, 2014 at 4:03 PM, Jamie Bliss notifications@github.com wrote:

@vectorjohn https://github.com/vectorjohn It still depends on the old data schema which is ... odd at best. Loading the entire profile database into memory every time is just inefficient. (Although what you do is better than what the current code does...)

Forcing promises through the API (restructuring code that I believe @benjaffe https://github.com/benjaffe has already said needs help) gives the storage layer more options with regards to lazy loading, caching, collating, and other management. (I also have a plan in place for data change events.) Replacing JSON schema with more API calls means that there's flexibility to change more of the underlying data storage without having to code everywhere: Only the schemas of profile objects and individual settings are used in the plugin, everything else is internal to storage.

(This discussion, of course, only deals in the syncronizable data and doesn't begin to touch the profile-question database.)

In any case, it's up to @benjaffe https://github.com/benjaffe which proposal to go with moving forward, it's his project. In the mean time, I plan on continuing my work of finishing the API, migrating the rest of the plugin to it, and then moving data to the event page.

— Reply to this email directly or view it on GitHub https://github.com/benjaffe/chrome-okc-plugin/issues/11#issuecomment-47719465 .

benjaffe commented 10 years ago

@astronouth7303 Good questions.

  1. Yes, okcpDefaultQuestions is deprecated, and storing them in a file would definitely be faster. I was planning something or other a year ago, no idea what. :)
  2. Sure, that sounds fine to me.
  3. Yes, the temporarily cached profile data. If anyone from OkC looks at my code, I don't want them thinking that I'm stealing user data for a potential future project, so I want it to be absolutely clear that I'm not storing any user data in any real way other than to decrease http requests.

Gotta respond to the rest of this later. :)

benjaffe commented 10 years ago

I like the idea of implementing promises/async in general. I think it's a more solid choice for movnig forward, and opens up possibilities for storing data on a server somewhere, if this project ever moves that direction. I'm open to arguments either way.

@astronouth7303 yes, definitely, much of the code needs help. It started as a project only for me, then got opened up without customization, then people demanded it, so I built it. It's definitely a project with some history.

AstraLuma commented 10 years ago

@vectorjohn The problem is that you have to load and parse the entire database every tab, every page load. If you've visited 500 profiles (easy if you use the plugin for a year), or even a thousand (moderators, anyone?), a little bit (or, in the case of questions, a lot) of data gets stored with each one.

I would argue the promises exposed means that data can be lazy-loaded in small pieces (schema allowing), minimizing memory and preventing jank. Using an event page means one process does the heavy lifting and in-memory storage. A more structured API means that each individual piece of code is simpler, making development easier.

AstraLuma commented 10 years ago

@benjaffe I'm not sure how to bind my call backs and promises to knockout, can you give me some advice on that? Or does ViewModels.js just need to be restructured to handle lazy loading?

benjaffe commented 10 years ago

@astronouth7303 We could probably just restructure ViewModels.js to handle lazy loading, feeding in the data to Knockout when it comes in. As far as implementation details go, I'm definitely no expert on Knockout. I used Knockout on this project, but haven't done much in-depth work with it otherwise.

(As a side note, regarding my availability, I can reply to discussion topics and merge pull requests, but I'm probably too busy over the next two months to dedicate any real chunks of time to the plugin. I wish I could, but I'm super impacted with teaching classes and consulting until mid-September.)

sapid commented 9 years ago

Has anyone made any progress on this? I spent the afternoon looking at the branches made by @astronouth7303 and @vectorjohn 6+ months ago. (Incidentally, @astronouth7303, I fixed one of the bugs you had that prevented extension data from binding to and populating the Labels dropdown by delaying the creation of the label menu to document.ready time.)

I also poked around the chrome.storage.sync docs. Sync's storage limit is 100KB. Right now, each entry in okcp.profileList is 100-150 chars. Since it's all in the ascii range, this translates to 100-150 bytes. At 150 bytes, that's an upper limit of ~600 users stored, which makes me wonder if sync is worth adding, at least using chrome.storage.sync rather than some server with client-side encrypted data. As @astronouth7303 said above, hitting 500 profiles is pretty easy within a year, particularly if you live in a population-dense area.

So, here's some thoughts on the viability of syncing. I think a lot of the stuff in there can be stripped out for the purposes of syncing. Last-visited, Last-modified, even Location - none of this really needs to be synced. Location data gets stale over time anyway - people move.

All that's really necessary for useful sync functionality are the username and associated flags. Everything else can be moved to local storage, and the flags can be compressed a lot - down to a 2 character value, really, by simulating a bitmap with bitwise operators. Even if the number of flags doubles or triples, we're still at 4 characters for the flag value. I think the username character limit is 16?

So, if we use a data structure of the format Username: flags our per-profile data consumption is at most "sixteenCharsHere":31, or 22 bytes. 102,400/22=4654 which should last a couple of years at least. With settings also synced, we're solidly above 4k users. With some client-side optimizations (for example, keep track of when we last saw people we're keeping data for and keep only the most recent 4500) that's probably workable. In fact, since not all usernames are 16 characters, we're probably looking at a practical limit of ~6k users. I suspect 98% of users won't ever reach that limit... and the last 2% can probably be accommodated by intelligent working-set management.

... Unfortunately, chrome.storage.sync also has a limit of 512 items, which brings us back to square one unless we figure out some kludge-y way of encoding our data into 512 items, none of which are larger than 8192 bytes (372 profiles). Some kind of peculiar hash table might be viable, but unless someone else has already done the work, at that point we're probably better off settling for a working-set of 512 and getting intelligent working-set management working and a smooth way to migrate the entire local database to another client. (e.g. a good backup/restore function that isn't clipboard-based.)

There's also throttling to keep track of, but if we only write to sync every 30 seconds (or 10, or 5) it's not a big problem.

Alternatives might include...

Compressing flags to a bitmap is probably a worthy goal regardless.

Anyway, the way I see it, the two best options are either

Thoughts?

AstraLuma commented 9 years ago

I know in my use cases, having all my instances online at the same time is just not feasible.

Is encrypting user data on the server really necessary?

Maybe looking into CouchDB + PouchDB is worth while? That should make changes and replication super easy. Just need a front-end service to handle user registration. You'd still have to run a server, but in my experience, keeping CouchDB running is not a problem.

sapid commented 9 years ago

Per #8, Ben said he wasn't interested in a server option due to concerns about data scraping and user privacy, and if he went with one, he would want it to be encrypted. That's why I've been exploring the encryption options. I think looking in to PouchDB is not a bad idea at all; running that in the event page might be worthwhile. As it stands, we're loading the entire profile database and performing a search on it by iterating through it until we find the right profile, and doing that on each page load, correct? PouchDB would be a much nicer solution.

I take back what I said about the database needing to be an encrypted glob. (The hazards of reasoning about data structures at 3AM.) Assuming symmetric encryption, I don't see any reason why we couldn't encrypt before moving things in to the database - we should be able to encrypt each field individually.

One option would be to store automatically generated keys in chrome.storage.sync. This way, we wouldn't need to handle user registration necessarily. Storing it in chrome.storage.sync would mean that the user would have to authenticate to their browser, which could essentially serve as our auth mechanism. The tradeoff of this sort of seamlessness, of course, is that if a user somehow loses their keys in chrome.storage.sync, they lose their data.

Traditional user registration would prevent this sort of data loss. It would also let users sign out while they weren't using their computer, or while another user was logged in, which using chrome.storage.sync as a cache using LRU page replacement doesn't offer. (At least, not innately - I suppose there's no reason we couldn't implement that as well.)

Regarding the question of the necessity of encryption, per issue #8, Ben said he wasn't interested in a server option due to concerns about data scraping and user privacy, and if he went with one in the future, he would want it to be encrypted. If all we're storing is username and flags, perhaps that's unnecessary, but people tend to be sensitive about this sort of thing, and I don't think encryption would be very difficult.

Perhaps we could also offer the option to sync or not sync, and make the default not to sync. (That's no change from present circumstances, after all.)

AstraLuma commented 9 years ago

Encryption in JS is pretty difficult, simply because libraries aren't wide-spread or well-used. They exist, but I think they're mostly toys and academic. On top of that, the odds of any of them being audited is probably approaching zero.

Even after you have a crypto library, turning crypto primitives into a usable and secure system is hard (this is why standards like OpenPGP and TLS exist). To top it off, key management (esp since loosing the key is catastrophic) is complex without support and user knowledge of it.

In short, rolling your own crypto is hard.

I guess the real question is, are we protecting from outsiders getting the information or from admins? Protecting from outsiders means other measures can be taken (securing the API interface, encrypting the file system, etc.) without having to roll your own crypto. Protecting from insiders, though. That's a much, much harder nut to crack.

Given what is synced and the probable security of OkCupid (and other sites with this kind of information), I would think per-user security with perfect prevention of insiders is just not worth it. Esp since users already trust developers (to download and install the extension in the first place).

sapid commented 9 years ago

I agree that rolling your own crypto is hard, and I'm not advocating that we build a library ourselves; I think that's a dangerous option that is all but guaranteed to do more harm than good. I'd much rather use an implementation like http://www.openpgpjs.org/ or wait for a release from https://github.com/google/end-to-end.

It's worth noting here, I think, that openpgp.js was audited a little less than a year ago and several critical vulnerabilities were found. https://cure53.de/pentest-report_openpgpjs.pdf I didn't go looking very hard for additional reviews, this shows there's been at least one. I don't know if these vulns have been patched in the past year, but I didn't find issues on the repo through a perhaps somewhat cursory inspection.