WorldBrain / Memex

Browser extension to curate, annotate, and discuss the most valuable content and ideas on the web. As individuals, teams and communities.
https://worldbrain.io
4.43k stars 341 forks source link

Discussion PRO/CON Native/Server Version #328

Open blackforestboi opened 6 years ago

blackforestboi commented 6 years ago

In the past weeks we talked about a native version for Mac/Win/Linux a couple of times. In this issue I want to lay out, and collect, the Pro's and Con's of working first on a local version or the (self-hostable) server software (Memex Cloud).

Native Version

It would first and foremost function as a backup database for the extension, and enable a Dat node to be hosted.

Initial version roughly of following parts

 - Electron(?) app 
 - Database
 - Dat Node
 - Potentially offloading search to native version& providing API for extension to fetch from > otherwise we might duplicate data in browser and native version
 - Authentification & UI for it > no UI for search for now > all in the extension. 
 - Automatic handshake with extension for seamless setup for users. I have seen this in other apps, that have both local and extensions, not sure how difficult this might be. 
 - HTTP API on localhost (?)

Pro:

Of course not all those opportunities are immediately implemented, just listing them for having a good overview. Some points are also applicable as a 'pro' for the server software.

Con:

Server

Practically the same functionality than the local version, but on a server and with the necessary orchestration.

Pro:

Con:

Other remarks:

@bluesun @ShishKabab @blahah @poltak @bigbluehat @arpitgogia @Treora @bnvk @tilgovi If I am correct with my assumptions and observations, the native version indeed makes the most sense to develop next. However, I don't wanna be trapped in the confirmation bias or motivated reasoning, so I'd be really appreciate your input/challenges so we make a sustainable decision on how to go forward.

Thanks for your help folks!

poltak commented 6 years ago

Backup locally to overcome data eviction/persistence problem of extensions, which is critical (!)

I think the most important thing is getting confirmation on the current state of this in web exts - it's still an unknown from last time we looked into it. We've seen that, in the latest Chrome and FF, that unwanted storage deletion behaviour can be overridden in web apps via Web APIs, given some usage permissions conditions are met, so I'd be confused if it wasn't either default or afforded some other way in web exts (different permissions model in web exts, for example). Although there still doesn't seem to be any web-ext specific info we can find. There should be some list we can ask on, or someone associated with us that would know - we should probably prioritise getting a real answer to this

Is the idea that the native version is an optional enhancement to the ext or users need it to use main functionality?

there might be problems accessing the local version through localhost because they disallow incoming transmissions. Anybody an idea how to solve this, or if it is really a problem for our use case?

Yes, I'm pretty sure FF exp CSP blocks localhost in FF but allows in Chrome, as of last time I was playing with the analytics server locally - FF may allow https://localhost , not sure. But you also have Native Messaging which could give them something to talk over.

As a user, personally, I like the native version idea. All my data stays with me, no need to backup/sync over Internet (could be slow. where's it going? etc.) - although having that choice later on would be cool. As long as the interaction all stays within the extension, so I don't have to mess around with another UI (lastpass native binary thing is like this). I don't think any need to use something big like Electron unless we really need a UI. The big barrier I think will probably be getting people to install it - we also don't really have any idea how many users would be interested (can we ask?). Lots may be less easily convinced or just not care enough.

I don't really think the idea of sharing is limited to a native version (maybe Dat specifically?). Also not sure if native version will help with a Safari ext. Any ideas?

An MVP of just the remote backup part of the server idea (I think the main part we've talked about before) could be fairly low effort by making use of a lot of existing data services out there and having a simple server to just map user keys to data locations. I suppose things would get a lot more complicated if that data was to be interacted with on the server though (so in a live DB rather than just serialized data sitting somewhere). And there's maybe liability and privacy concerns we need to worry about if we are managing user data (is this really much of an issue?). I would like to hear more about the broader plans for the server idea though - what other interactions with the ext other than data backup?

bohrium272 commented 6 years ago

I'm for the native version as well Though a server sounds like the most usual way of solving the backup problem. The native version can help us exploit the DAT project and all it's benefits. As a basic backup option, we can use Dropbox or Google Drive or something like that for backups. That will not be very difficult to implement. The fact that multiple browsers can feed into the same app is very very compelling.

As long as the interaction all stays within the extension, so I don't have to mess around with another UI (lastpass native binary thing is like this). I don't think any need to use something big like Electron unless we really need a UI

Maybe we can have a simple command line interface for the local application and then the UI could be offered as a paid upgrade.

Question Would the decentralised storage be the correct option, given that we plan to build mobile applications as well?

tilgovi commented 6 years ago

Firefox is planning to ship support for dat in extensions: https://blog.mozilla.org/addons/2018/01/26/extensions-firefox-59/

tilgovi commented 6 years ago

That doesn't mean that Firefox will implement it, and I'm not sure it helps any use case for WB, but the suggestion is that extensions could implement the protocol.

I guess maybe I am talking about a different problem. Just because you implement dat in an extension doesn't mean that the extension has a local place to replicate dat data.

I think I see the purpose of the native messaging, then, and maybe that provides a way to pin data locally as an option.

blackforestboi commented 6 years ago

Thanks so much for you input folks!

@poltak

I think the most important thing is getting confirmation on the current state of this in web exts

A message I got from @bigbluehat about this is was that "at the moment it doesn’t look like browsers can be trusted for this stuff [persisting data]...yet. At least not completely."

Is the idea that the native version is an optional enhancement to the ext or users need it to use main functionality?

Both. It is somewhat optional, as you CAN use the extension fully without the need for the native version. But you run danger losing your data because of the eviction problem. The native version would ensure a user's data to be save, so somewhat important to have the main functionality. Also AFAIK we can't have p2p connections between extensions, so participating in the sharing network without anything in the middle (native/server) would not be possible. With Dat, we can make it possible without the need for a server.

although having that choice later on would be cool.

that would be the cool part about having a Dat node. It would make it so much easier to sync between the computer and a server then. Connecting to a server would be almost the same flow than setting up the connection from the extension to the native version.

lastpass native binary thing is like this

Yeah I also saw this in lastpass, they do the handshake pretty conveniently. Wonder if they have a server to help mitigating the connection or so.

I don't really think the idea of sharing is limited to a native version (maybe Dat specifically?)

Without a server, it seems to be tricky for me, because we can't open a connection between 2 extensions, right? And yeah, if Dat was available in extensions, we would not need to have a local version per se if we wanted to enable sharing based on the dat protocol. As @tilgovi mentioned, Firefox is planning to ship dat support > but our biggest user base are Chrome users and probably will be for a very long time. So we can't only rely on FF supporting Dat.

And there's maybe liability and privacy concerns we need to worry about if we are managing user data (is this really much of an issue?)

Yeah i find this one to be a biggie, because it would prevent users from trusting the extension, or trusting that their data is not deleted, by handing data to another company they don't know if they can trust. Also it would undermine our product vision and value proposition of privacy and decentralisation by making it necessary for a single company to store data in order to have a product that is usable.

what other interactions with the ext other than data backup? you mean for the server? As far as I understand it would have the same interactions than the native version. Plus mobile/web support. Does anybody have additional things in mind?

@arpitgogia

Maybe we can have a simple command line interface for the local application and then the UI could be offered as a paid upgrade.

I'd like to keep it that way that software that does not produce running costs should be as free as possible. It will be open source anyhow, so we could not prevent people from changing the software and unlock such a paid feature without costs.

Would the decentralised storage be the correct option, given that we plan to build mobile applications as well?

AFAIK Dat is already semi supported on Android, no plans to make it possible on iOS known to me. Replicating a full node is also for most phones not feasible though, thus phone support only is really possible with Memex Cloud.

bnvk commented 6 years ago

Firefox is planning to ship dat support > but our biggest user base are Chrome users and probably will be for a very long time. So we can't only rely on FF supporting Dat.

There's strategic rational behind go / please / stay / support where the users currently are. If you're trying to create radical tech + change that strategy quickly becomes a ball and chain.

ball-and-chain

Mozilla's support of DAT in extensions is a significant boon to the next gen of p2p apps. FireFox Quantum is great and every bit as performant as Chrome. People do switch browsers when there's a clear incentive. Beaker browser is also doing really solid work. Either you're helping build the incentive (and change) or you are not.

Since this summer my suggestion has been to firmly plant a flag and head towards things like DAT in whatever ways possible. Once that is 100% committed to the smartest / opportunistic engineering tasks should become clear :smile:

blackforestboi commented 6 years ago

Thanks @bnvk for taking the time to share your views. Ever since we spoke in summer, I worked with them to design a sustainable system. Your words were a helpful reminder on the importance of building an open and collaborative system, instead of an excluding and proprietary one. I support your views towards progressing open-source software, one of the reasons WorldBrain/Memex is open-source.

In the past months I also got clear about what the priorities for our project are when it comes to the role of open-source software. Even though I believe that there should be more open source software in the world, one of WorldBrain's highest priorities is to develop a system where a diverse crowd of people can share their perspectives more effectively - not to promote open-source software (which is still very important) This means diversity & inclusivity in terms of people, and the software they use to organise their knowledge, has a higher priority for WorldBrain than to focus on only working/integrating with open-source software. People should be able to use whatever service suits their needs the best, and if open-source software can do that, great. We also can't/won't judge what is best for them, and we don't want to impose our value system on them.

Either you're helping build the incentive (and change) or you are not.

As already said back in summer, I don't see it so binary. Only because you are not solely integrating and building on open source software, does not mean you are not helping the cause. I think in our case excluding people/software based on the kind of software they use, hurts our cause, and even the progression of open-source, more than it helps. Why? For users, there are a lot of switching costs involved to change a software, also to open-source, and sometime there is simply no suitable open-source alternative that covers their needs. We can be a much more effective by providing the necessary standards (or build on existing ones) and infrastructure to make smooth migrations possible - and help lower the time/convenience barriers that make many transitions hard. However at the end, open-source software itself needs to offer the incentive for people to switch. That is not our job and it is not for us to judge what is best for people.

As an old saying (by our generation's standards) goes: "I can only show you the door [Neo], you're the one that has to walk through it."

change that strategy quickly becomes a ball and chain.

It's indeed a problem if you rely on external services too much. I think our approach of focussing on interoperability/modularity helps lowing the probability of this becoming a problem though. Through that WorldBrain/Memex will hopefully keep the flexibility to support users in whatever software they using at any given time and avoid a lock-in ourselves. We really just want to be the infrastructure. The integrations are up to the community at some point.

blackforestboi commented 6 years ago

Adding a few thoughts after the conversation with Samir about this topic.

Through the conversation it became clear that there were 3 underlying challenges to solve that were not entirely explicit. Eventually we need to solve these issues together, with the server, a local version or something else (or in between). Do you folks have ideas on how we can solve them?

The challenges: 1) We need to find a way around the data eviction problem. After talking to @bigbluehat a 2. time, it seems clear that there is no way to avoid this problem by staying in the browser. It is a demanded feature by the overall community to have data persistency, but it is not on the horizon to be implemented. A post he wrote about it at the W3C repo. 2) I am worried that we currently don't have the funds, and the manpower, to sustain a server infrastructure for 50k+ people using Memex without charging money. Costs are therefore an important factor that influenced my bias towards going towards local server/application. The severity of this problem needs to be further evaluated by looking into the assumptions powering the overall costs for the 50k users.

Additional arguments that came up:

Con Local version 1) We would increase the complexity of supporting multiple system configurations. Its not just 3 OS and 2 browsers. It may also be a problem of all the versions that people can have, thus expanding the matrix of failure points. What we need to investigate:

Pro Local version 1) We can have first sharing features in the tool that can lead to network growth.

ShishKabab commented 6 years ago

I agree that we have to resolve this situation as early as possible. Part of me is trying to identify the code that can be shared between browser, server, and local server. On the other hand, I seem to recall that we wanted Memex cloud to store everything encrypted, right?

Aside from the costs to produce this solution, two things worry me: UX and mobile. If we want the local server to also run on mobile, we'd do well to research how to get this to run on mobile before we start implementing. For UX, the minimal we need to achieve is a seamless transition from in-browser to the external solution. We might even re-frame the in-browser version as a 'trial' version or something. The problem is though, to work out how long the user needs to use the extension in order to see the value in it, and whether he'll hit the eviction limit before that. In this light I'd really like to find a work-around somehow... One thing that pops into my mind is using the Filesystem API, which is used by games to store very large assets, but I don't know the specifics of it. Another idea could maybe be integrating with Google Drive or Dropbox? If we do need to choose the route of the (local) server, let's start thinking about what code could be shared between all entities.

As for the sharing feature, can't DAT work through WebRTC? It allows you to set up direct UDP connections between peers, being used for online multiplayer games for example. Problem is that it does need a server to connect two clients. Chances are that with the size of that initial handshake, you'd have a similar amount of data traffic as hosting an API, but minus the storage costs. Using something like Amazon Lambda may make this very cheap though.

Electron seems to have a few nice features built-in like packaging and auto-updates which simplify distrubtion. I'd stay away from building a UI in Electron though, because the entire UI could be ran from the extension.

bohrium272 commented 6 years ago

Another idea could maybe be integrating with Google Drive or Dropbox?

@oliversauter I'm strongly for this idea. Till we have a concrete backup strategy this is the best alternative. And I'm quite sure this won't require much effort.

EDIT

I'd stay away from building a UI in Electron though, because the entire UI could be ran from the extension.

Has anyone tried an app named Plex? It basically creates a media server on your machine through which you can stream content to other devices on your network or over the internet. It has a desktop client but the way it works is the application starts a server which you can then access on localhost using your browser. I think that way we only need a sort of backend and the interface can be done through the extension itself as Vincent has mentioned.

BigBlueHat commented 6 years ago

Plex is an interesting one to examine. Here's an extension I've found (though I've not tried): https://addons.mozilla.org/en-US/firefox/addon/web-to-plex/?src=search

It uses a remote endpoint to discover servers and then access them (via IP + port # addresses) to confirm ownership of movies as you browser imdb.com etc. https://github.com/SpaceK33z/web-to-plex/blob/master/src/options/index.js#L17

The key thing to sort out here would be whether Browser Extensions (in "all" browsers) can access a localhost, 127.0.0.1, or other IP address based URL. If so, then running a local app/service + browser extension-based integration seems like a good route.

ShishKabab commented 6 years ago

Scrap the idea of WebRTC for DAT :( They've found that the performance is not enough, so they don't want to focus on that until performance is improved...

blackforestboi commented 6 years ago

@ShishKabab

I seem to recall that we wanted Memex cloud to store everything encrypted, right?

Yeah. Encryption, at least on the remote server is a must.

two things worry me: UX and mobile. If we want the local server to also run on mobile

Why do we need to have a local server run on mobile? Do we really need to take this into on that right now? What are the reasons for that? In my mind at least, the mobile versions only would be clients that fetch/put data from the server of a user. Reason a: At some point we also still need some sort of incentive for people to pay. :)
Reason b: Having the whole database that a user builds up on server/extension/local version synced to the phone might get tricky storage/performance wise.

We might even re-frame the in-browser version as a 'trial' version or something

Good idea to frame it differently, but maybe framing it specifically as 'limited' version that requires users to install additional software to enjoy its full potential. Even though I think installing a (local) server for the sake of just being able to persist data is a not enough, I think we can make it really nice UX wise by showing the limitation while people are in the flow of wanting to use other features. So for example we have a button "share collection" and when they click on it, and a (local) server is not connected, it shows them how connect/install/download the necessary upgrade.

Another idea could maybe be integrating with Google Drive or Dropbox?

@arpitgogia @ShishKabab This is definitely an option and the second most wanted feature by users. It would definitely solve the eviction problem, if we get the sync done right. But also here we'd need some sort of encryption, so we don't store the data in clear text on those servers. That would be bad.

Scrap the idea of WebRTC for DAT

Yeah I heard it also has still some bugs with extensions, so not working really well. What are the performance issues that came up? Is that really a problem for us? Say we would send around packages of 200-500kb per shared website. Still a problem?

@arpitgogia

It has a desktop client but the way it works is the application starts a server which you can then access on localhost using your browser.

This approach was the one I had in mind with the local version. Exposing a localhost and having the UI fully in the extension, as @ShishKabab already mentioned.

tilgovi commented 6 years ago

But also here we'd need some sort of encryption, so we don't store the data in clear text on those servers. That would be bad.

Users trust their sensitive data in Dropbox and Drive every day. Many people would be fine with storing data there without any additional encryption. Both services encrypt their own storage and transport and restrict employee access to user data. Unless your threat model is government ordered data requests, you don't need to encrypt this data yourself.

tilgovi commented 6 years ago

Users trust their sensitive data in Dropbox and Drive every day.

And if they don't, they can run whatever server offering you develop as an alternative. I am not sure putting effort into client-side encryption for data stored in Dropbox and Drive is worthwhile.

blackforestboi commented 6 years ago

And if they don't, they can run whatever server offering you develop as an alternative. I am not sure putting effort into client-side encryption for data stored in Dropbox and Drive is worthwhile.

Really good point. You wouldn't use these services as a backup, if you really cared about privacy. :)

BigBlueHat commented 6 years ago

@oliversauter at this point it might be a good idea to narrow in on what your storing--and then go back to discussing where to store it.

If you're storing freeze-dry'd copies of Web pages directly into Dropbox or Drive, then those services can (and would) provide search across them. However, if you're providing a unique search experience within the browser (probably coupled with annotation, etc), then you'll either need to store more data there or easily navigate the user back to the WorldBrain experience.

Additionally (for better or worse), storing into someone-elses-storage also means other features will become their domain (literally and figuratively) as well--such as sharing, managing, etc.

Taking a step back to remember what you're building and why might be best for going back into the technical forest. 🌳🌲🌳 😸 🌳 🌲

blackforestboi commented 6 years ago

Additionally (for better or worse), storing into someone-elses-storage also means other features will become their domain (literally and figuratively) as well--such as sharing, managing, etc.

Not sure if I completely got that and not sure which questions to ask to clarify, except for "can you give that another try?" :)

BigBlueHat commented 6 years ago

@oliversauter yeah...that was a bit overloaded for one sentence. 😄

If you store visits in Dropbox, then why implement search? or annotation? Dropbox has those.

Conversely, if you're implementing search and annotations (in a unique way: decentralized, standards based, not-trackable, etc), do you want to store that content and annotations in something that provides the identical service but under a different ToS?

Hence, my recommendation that you/y'all step back and (re)determine what are WorldBrain's primary non-technical objectives and goals.

Once you've reaffirmed your "creed," then go back and pick your technology to match. And maybe read this: https://frankchimero.com/writing/the-good-room/

ShishKabab commented 6 years ago

@BigBlueHat While you could bend those tools to search, annotate and share, it's about the final UX. WorldBrain allows users to easily do those things in their daily flow while having a unified network in which to share, follow and discover people and content, which is not what these storage tools are for.

blackforestboi commented 6 years ago

Hence, my recommendation that you/y'all step back and (re)determine what are WorldBrain's primary non-technical objectives and goals.

Yes, I see the conflict of storing it on GDrive or Dropbox and it gives me a slight headache. I am not all too comfortable with going down that route, but it would with easy steps eliminate the persistence problem, at least temporarily until we found a better solution. We could also store the data either encrypted or in a format not usable for those services.

But yeah, ideally we would not need to integrate with them. What users want is another thing.

blackforestboi commented 6 years ago

@BigBlueHat From the great great article you shared:

Facebook, Google, Apple, and Amazon aren’t going anywhere at this point—nor should we expect them to—so it’s best to recalibrate the digital experience by increasing the footprint and mindshare of the kinds of cultural and communal value they can’t provide. The web isn’t like Manhattan real estate—if we want something, we can make space for it.

ShishKabab commented 6 years ago

Why do we need to have a local server run on mobile?

True, was kind of a leap of thought, and it indeed it doesn't have to. This means we offer no mobile functionality whatsoever in the free version, right? It makes sense on the one hand, but on the other hand it kind of makes me think how many people want to get familiar with WorldBrain first through their computers.

Maybe framing it specifically as 'limited' version

Yeah, trial is not the right word. Let's think about it.

Drive/Dropbox

I think that as long as we have a good internal API in the client through which we store and retrieve backups, putting an ecryption layer between it shouldn't be so difficult. We could even do this in a successive phase, and share the code between the Google and Memex Cloud integration code.

WebRTC for DAT

Well, it was mentioned somewhere (can't find it in my Memex though ;) ) that the approach of using WebRTC for DAT was discontinued until perfomance gets better. This means that we would have to pick it up again, or use something else than DAT for sharing.

blackforestboi commented 6 years ago

It makes sense on the one hand, but on the other hand it kind of makes me think how many people want to get familiar with WorldBrain first through their computers.

I think it is partly due to a focus in our target group. The people we are working with first, mainly do their research on the computer, mobile is complementary. Through an integration with pocket, people could at least save their stuff from mobile (and read it later as well) > search not yet.

ShishKabab commented 6 years ago

I think it is partly due to a focus in our target group.

True, that'll help, at least for the initial phase. Later, mobile could become a bigger entry point, when people become interested to verify articles they read while traveling. But since our initial target audience has these characteristic, I'm OK now with considering the desktop as the main entry point.

Treora commented 6 years ago

Nice to read people's thoughts on system architecture here, something that I have been pondering about a lot too. I'll blurt out some thoughts..

About the persistence of data in IndexedDB: The mentioned MDN page is mainly intended for websites, as @poltak already noted. I'd hope a browser extension, especially with unlimitedStorage permission, would be treated more respectfully. If not, it would be worth talking about with the Firefox developers (and likewise for Chromium etc).

Nevertheless, I would still not consider the browser extension a good place to store a user's data, mainly because it is a silo. You'd want at the very least some way to back-up or move data to another computer or another browser. Furthermore, to really follow the philosophy that the user owns their data, best would be to enable them to read, edit, augment, move, or share the data using tools of their choice. If the data would be stored in a format specific to this application, and the server side (if any) is made specifically for this client, the data would effectively only be theirs in the sense that other people cannot read it.

Accordingly, the theme in this discussion seems not just to be what architecture this application would need, but also what kind of an ecosystem you/we hope to grow and be part of. It seems worth having a good look at what is available in terms of standards and protocols, and implementations of and ecosystems around them. Pegging onto others' projects may save some work too. :)

I am myself considering to investigate whether a protocol like WebDAV may suffice, so that e.g. any NextCloud instance would work as a server. Of course, Dat, IPFS, maybe SoLiD, or others would be more exciting and are worth considering; especially if, again, an existing server can be used instead of having to create one from scratch. Of course, features not present in the chosen protocol/implementation may still have to be built, but it could be nicer to extend an existing project than start a new one.

Besides architecture and protocols, I ponder about which data formats to adopt. My current preference is to use html wherever appropriate; plus possibly Web Annotations to create things like links, bookmarks, tags. But the options are still open, ideas welcome!

One more note about 'server versus native' dilemma: it likely depends on the intended scenario, but there is something to say for adopting a server architecture in any case, as it could also be packaged to run on the user's computer. The browser extension then just talks to localhost instead of a remote address. You could then still decide any time to make a native UI on top of that, and/or decide to replace the localhost loop with native messaging.

dzmitry-lahoda commented 6 years ago

Current extension web storage could be deleted-evicted and do I loose my Memex data, as of now? What is lost and was is retained? E.g. I have Firefox bookmarks and history are synced by Mozilla. I will install Memex and it will re import these again? So I have partial backup? If eviction was somehow fixed by web APIs and no eviction happens, my data is not synced onto my other Memex now? I have only single-local machine research storage?

Data is stored in some standard web storage, is there any 3rd party provider to sync that data? That third party may already be part of p2p web browser out there? 2 version of Memex are possible - one is usual extensions to usual browser with server sync and other as version into p2p browser sync if I have capacity-urge to install such.

dzmitry-lahoda commented 6 years ago

May be interesting example https://github.com/turtl