cryptee / web-client

Cryptee's web client source code for all platforms.
https://crypt.ee
Other
449 stars 23 forks source link

[Feature] Alternate Export Options #81

Closed jebbster88 closed 4 years ago

jebbster88 commented 4 years ago

Is your feature request related to a problem? Please describe. I don't like vendor lock-in, and am always hesitant to migrate to a system without having a clear exit strategy. The UECD exports, while totally parse-able, aren't the most portable of files.

Describe the solution you'd like I'd love it if the "export my data" page, offered other formats for export. Specifically Markdown (or less desirably HTML).

Additional context Having looked at the UECD files, I believe they are in the quill-js format? I did find a node library for converting quill to md, but not sure how well that would port into a PWA.

I'd basically like to know if there are any plans for friendlier note exports? If not I'll roll my own to run on the exported files as part of my exit plan (not that I hope to have to invoke it!)

johnozbay commented 4 years ago

Hey there! ✌🏻 Thanks for filing this!

You're spot on with the format! Yep it's a Quill Delta, Encoded & Encrypted.

So the reason why I couldn't add more export format options has a couple technical, and some legal reasons, and I'm pretty sure it's all going to click and make sense once you read through this blog-sized response.

Reason No 1) File formats, feature-sets they support, and interoperability.

Let's talk about Markdown / text files first. Markdown is great for many things, but not for complex text editing. First of all, there's no universally agreed upon specs. At the moment, commonmark is the best resource, and there's TONS of editor features missing from markdown. Check this out. Things as simple as font sizes are not supported by Markdown. So you can do header 1 - 2 - 3, but not "38pt" fonts without custom html in markdown. πŸ˜” You can read more about some of markdown's shortcomings and others' frustrations here

– even the markdown flavor we're using right now inside Github comments, is based off of commonmark, and extends it in a way to support many additional features like (tables, checklists, strikethrough, auto-links, disallowed RAW HTML for security etc. etc)

In short, Cryptee has lots of non-markdown standard elements & features. For example, the linked cryptee-documents/files, inline tags, KaTeX math, tables(!), checkboxes, strikethrough, auto-links, disabled raw html for security, advanced text styling etc. (and once collaboration suite in Cryptee Teams is ready, there will be task delegation with @ username or ability to add inline dates for scheduling reminders etc.) So over time, these non-standard features would make Cryptee's markdown implementation increasingly more and more non-standard and non-interoperable.

So while for basic plaintext exports markdown could work, the only way this could work reasonably is if during export, Cryptee opened docs into an editor, one-by-one, took out all non-standard elements, and exported that way. And that would defeat the purpose of a "backup", if you have important information in tables or tags or KaTeX math etc. it would all be gone – This is part of the reason why we only allow exporting to formats like Markdown from within the document context menu (and not from a right-click menu for example) and put a big bold beta to show that results may vary depending on the content you're exporting.

Same goes for other formats too, even with simple TXT or RTF we wouldn't be able to export half the stuff you're able to do in Cryptee's editor. (not to mention even more of the upcoming features)

HTML is the only feasible candidate, since we are after all rendering the documents in a browser window. With a little bit of processing and styling, HTML is theoretically the best format candidate. We can theoretically do tricks like convert Cryptee Tables to HTML tables, or things like make inline tags like links to other files with ID numbers for example.

But even then, some other app, with the same exact rich feature set Cryptee has would need to build support for HTML import with all these features in it. And there really isn't any app out there that is as feature-rich, nor one that supports HTML imports with all these features AFAIK (if there's any please do let me know πŸ˜…)

Reason No 2) Licenses

Now that markdown / html etc is out of the way, let's talk about more mature formats, like docx, xlsx, pdf etc. For these formats, there's a semi-legal, semi-technical reason why we can't easily export or import from them.

Since all your data on Cryptee is encrypted on your device, and Cryptee is entirely open-source, and our code is freely available for anyone to read and verify, as a side-effect of this, we cannot 'easily' ship support for formats with proprietary licenses.

Be it docx / xlsx from Microsoft / or pdf from Adobe etc. Since shipping support for these would mean, we would need to ship their proprietary licensed code, with our open-source licensed code, and thus we would be open-sourcing their code. So we legally can't use their non-open source methods.

Reason No 3) Open-Source alternatives aren't as feature rich enough for full & comprehensive exports

Since we can't use closed-source SDKs for formats like docx, this leaves us with open source alternatives, like dolanmiu/docx for example.

It's great, and works pretty well for basic rich text. But needless to say, the feature-coverage is never going to be 100% perfect, given how many years file formats like docx have been around, and how sophisticated they've become in over 30+ years of use. So while it works for basic cases, it's not as feature-rich as the original docx format, and much like Markdown, it lacks tons of features we'd need to make it work for Cryptee Docs to export without any data-loss. (thus again, defeating the purpose for the bulk export, which most users would use for backing up / moving away)

Reason No 4) Client-Side Encryption / No possibility to convert formats server-side

On services like Google Docs, due to the fact that Google and their servers can see all your documents' contents, they can handle all these types of file conversions / imports / exports on their servers instead, without having to ship the proprietary code in their apps.

Even better, they can do conversions by relying on conversion APIs offered by Microsoft, which of course gives the best of the best results.

But with Cryptee since all your documents are encrypted on your device, and our servers cannot see any of your files' contents, we are unable to do anything like this.

And while technically we can hook up to Microsoft's API from the client-side, so the plaintext document isn't accessible to us, I don't think sending plaintext documents to Microsoft is compatible with Cryptee's privacy model. So for privacy & security reasons, we can't and don't offer a way to send your documents' contents in plaintext to Microsoft, even from the client-side of the app.


As for Unencrypted Cryptee Docs (UECD) files. I know they're not ideal, not the prettiest, nor the best solution to this problem. But it was the best interim solution to the portability problem in the meantime. (Which in theory is what Evernote does with its home-made ENEX, – and there's even worse examples like Apple Notes, which doesn't even allow you to export your notes in bulk, and you have to use external pieces of paid software to get plaintext dumbed down versions of them out of the software)

By all means, I think the rich-text editor / documents-editor industry is in a state of nightmare as far as file-formats / inter-operability / exports go. And in that sense, I honestly think at Cryptee we're doing better than most, by allowing to import from multiple different formats, and support individual documents to be exported to many different formats to serve and cater to everyone's unique and different use-cases.

I know we're not there yet with bulk-exports, and the situation is far from ideal, but that's also partially reflecting the state of the industry as it is. Looking at the bigger picture, problem here is 50% about us not supporting more inter-operable export formats, and 50% is about other apps, not supporting any inter-operable formats for imports on their end.


So to summarize the issue, there are format incompatibilities, license incompatibilities and threat-model incompatibilities πŸ˜”

If you can think of any open-source & widely inter-operable format that supports all the features we can offer, I'd be more than happy to prioritize and add support for it in individual and bulk exports right away! However, despite spending months researching, we still couldn't find a suitable file-format candidate we can use in the bulk exporter, where users can export all their data (could even be up to 2TB) knowing that it is indeed exported without any data-loss, and do so completely inter-operably.

And while yes technically we can use the Markdown / Docx converters we have in Cryptee Docs ... I simply don't want to give users false expectations, let them export 1TB of documents in Markdown, (which they obviously can't read through it all and double check 100% after exporting), and 3 months later find out one document is missing a table that had incredibly important pieces of information in it.

So these are a bunch of reasons off the top of my head, why we couldn't add any other bulk export formats. And to emphasize once again I'm 100% open to suggestions and improvements. I just want everyone to get the best experience with this, and don't have any unexpected / unforeseeable data-losses.

Hoping all this makes sense! Let me know what you think ✌🏻

All the best, J

jebbster88 commented 4 years ago

Wow that's a detailed response! Totally understand.

If you did have a bulk option for another format I'd probably have taken a UECD export additionally anyway for the same reasons.

I need to remind myself that your target audience isn't just people who spend all day on github πŸ˜…

If I do roll my own, I'll make sure to stick in warnings for elements that I havent defined a parser for, and I imagine I'll be stealing most of that logic from this repo!

Thanks again for such a detailed and well thought out response.

johnozbay commented 4 years ago

You're very welcome!

I need to remind myself that your target audience isn't just people who spend all day on github πŸ˜…

Yeah, I think this is the most important element here, and the biggest thing we're trying to address. The largest majority of the internet users aren't tech-savvy, they can't (and shouldn't have to) set up their own servers, and perhaps don't even know the meaning of the word "backend" or "server" (and shouldn't have to) – and yet there are tons of coder-friendly self-hosted / high-configuration / geeky-designed products out there.

The average internet users need a secure and private place for their files and digital belongings more than ever nowadays. So ideally the goal is to make Cryptee so simple as a platform that even my non-tech savvy parents would be able to use it, as easily as they can use Google Photos, without having to worry about the nitty gritty details of setting things up. ✨

If you do end up spending time on an importer/exporter, do let me know / keep me posted. We've been building an experimental desktop importer / exporter project, and we might be able to help each other out depending on what you're building & trying to achieve.


Also a heads-up: We're working on a v3.0 re-write of the whole service, and pretty much re-factoring everything. Most importantly, this new version will be source-available but not MIT licensed. To make it easier to deal with license incompatibilities, while also keeping the source available for public to audit.

Legal team and I've been going through some of the bottlenecks, and our license is one of the biggest things holding us back from adding a whole lot more features. So we're going to tackle this in a way that restricts all third-party use of our code commercially, and instead we'll have an audit-only, source-available repo for the v3.

So if you've got anything commercial in mind that is based off of this code, you'll be stuck with this version – and pretty much what you see is what you'll get (with minor bug fixes and minor feature tweaks to follow until we release the next version)

Keep me posted on your build & progress, and we'll help you easily & freely sort through licensing etc once we make the license transition, so you can continue working on the stuff you've been building for Cryptee ✌🏻

Can't wait to share all the exciting new stuff, patents & trademarks we've been working on! πŸŽ‰

jebbster88 commented 4 years ago

Oh don't worry about commerciality, my sights weren't set any higher than a dodgy python script! And when I said stealing, I mostly meant referring to your source to figure out what objects are markdown friendly, which I suppose is more like research than stealing.

Can't wait for 3.0.

johnozbay commented 4 years ago

Haha no worries at all! ✌🏻 By all means if you wish to build anything on top of the platform, like your python script that can somehow have commercial use, go for it. πŸ‘πŸ»

My point was that the licensing for the v3 won't allow this. For various reasons but mainly because we're looking into adding collaborative features in v3, and that will allow us to roll out support for businesses & teams to use Cryptee etc.

Can't wait to show and talk more about V3. Lots of amazingness and a whole new design on the horizon! πŸ”₯

If that's okay with you too, I'll close the issue for now – and if you have any ideas about a cool filetype / format we can use or how we can improve our exporter, please feel free to re-open this βœ…

Thanks again!

julianfairfax commented 2 years ago

By all means, I think the rich-text editor / documents-editor industry is in a state of nightmare as far as file-formats / inter-operability / exports go. And in that sense, I honestly think at Cryptee we're doing better than most, by allowing to import from multiple different formats, and support individual documents to be exported to many different formats to serve and cater to everyone's unique and different use-cases.

Why don't you make it possible for us to export documents to other formats in bulk? I'll be honest, I didn't read that entire message, but how would it any different than having the option for individual documents?

johnozbay commented 2 years ago

@julianfairfax I think you should read the entire message then.

Reason No 3) Open-Source alternatives aren't as feature rich enough for full & comprehensive exports

Since we can't use closed-source SDKs for formats like docx, this leaves us with open source alternatives, like dolanmiu/docx for example.

It's great, and works pretty well for basic rich text. But needless to say, the feature-coverage is never going to be 100% perfect, given how many years file formats like docx have been around, and how sophisticated they've become in over 30+ years of use. So while it works for basic cases, it's not as feature-rich as the original docx format, and much like Markdown, it lacks tons of features we'd need to make it work for Cryptee Docs to export without any data-loss. (thus again, defeating the purpose for the bulk export, which most users would use for backing up / moving away)

TLDR; when you bulk export, you wouldn't expect to have data-loss. If you bulk export 100 files as markdown files, and open them 1 month later and realize tons of content in the files are missing, like embedded videos, tables, tags, document links, math etc, you'd be pissed at us for having a shitty exporter. But none of these are actually a part of the original markdown spec, so there's nothing we can do. (you can use tables here on Github because Github added non-standard features to markdown, and it's out of sync with the original spec, and if you were writing documents here, it would be a compatibility nightmare, but thankfully you're writing comments, and don't need to move them elsewhere so it's not that big of an issue.)

Same goes for all other open-source compatible file formats. Some things will always be missing. There are some proprietary file formats that may support all our features, but we have an open source license, and can't ship exporters for these file formats due to license incompatibility.

For that ... again, read the message :

Reason No 2) Licenses

Now that markdown / html etc is out of the way, let's talk about more mature formats, like docx, xlsx, pdf etc. For these formats, there's a semi-legal, semi-technical reason why we can't easily export or import from them.

We do work around some of these issues by reverse engineering our own versions of docx or pdf, instead of using proprietary licensed versions, but we're talking 30+ years old formats, both built by multi-trillion dollar companies with hundreds of dedicated engineers full time working on them. No way a tiny startup like us can catch up with 30 years of compatibility backlog + reverse engineer their formats perfectly.

In short, if we allow lots of file formats, and you have data-loss, perception = we're guilty. If we don't allow lots of file formats to ensure you won't have data-loss, perception = we're guilty.

This is a standards problem, not a Cryptee problem.

All rich document editors deal with this problem the same way we do. In your windows file explorer / mac os x finder, you wouldn't select 100 word documents, right click and save as "pdf". You'd open all 100 of these documents in Word one by one, then "save as" PDF files manually. It's the same on Cryptee Docs. You can open up documents, save your document in any other format you'd like like html, markdown, word, pdf etc, fully acknowledging and understanding what will be missing in them.

julianfairfax commented 2 years ago

@julianfairfax I think you should read the entire message then.

Reason No 3) Open-Source alternatives aren't as feature rich enough for full & comprehensive exports Since we can't use closed-source SDKs for formats like docx, this leaves us with open source alternatives, like dolanmiu/docx for example. It's great, and works pretty well for basic rich text. But needless to say, the feature-coverage is never going to be 100% perfect, given how many years file formats like docx have been around, and how sophisticated they've become in over 30+ years of use. So while it works for basic cases, it's not as feature-rich as the original docx format, and much like Markdown, it lacks tons of features we'd need to make it work for Cryptee Docs to export without any data-loss. (thus again, defeating the purpose for the bulk export, which most users would use for backing up / moving away)

TLDR; when you bulk export, you wouldn't expect to have data-loss. If you bulk export 100 files as markdown files, and open them 1 month later and realize tons of content in the files are missing, like embedded videos, tables, tags, document links, math etc, you'd be pissed at us for having a shitty exporter. But none of these are actually a part of the original markdown spec, so there's nothing we can do. (you can use tables here on Github because Github added non-standard features to markdown, and it's out of sync with the original spec, and if you were writing documents here, it would be a compatibility nightmare, but thankfully you're writing comments, and don't need to move them elsewhere so it's not that big of an issue.)

Same goes for all other open-source compatible file formats. Some things will always be missing. There are some proprietary file formats that may support all our features, but we have an open source license, and can't ship exporters for these file formats due to license incompatibility.

For that ... again, read the message :

Reason No 2) Licenses Now that markdown / html etc is out of the way, let's talk about more mature formats, like docx, xlsx, pdf etc. For these formats, there's a semi-legal, semi-technical reason why we can't easily export or import from them.

We do work around some of these issues by reverse engineering our own versions of docx or pdf, instead of using proprietary licensed versions, but we're talking 30+ years old formats, both built by multi-trillion dollar companies with hundreds of dedicated engineers full time working on them. No way a tiny startup like us can catch up with 30 years of compatibility backlog + reverse engineer their formats perfectly.

In short, if we allow lots of file formats, and you have data-loss, perception = we're guilty. If we don't allow lots of file formats to ensure you won't have data-loss, perception = we're guilty.

This is a standards problem, not a Cryptee problem.

All rich document editors deal with this problem the same way we do. In your windows file explorer / mac os x finder, you wouldn't select 100 word documents, right click and save as "pdf". You'd open all 100 of these documents in Word one by one, then "save as" PDF files manually. It's the same on Cryptee Docs. You can open up documents, save your document in any other format you'd like like html, markdown, word, pdf etc, fully acknowledging and understanding what will be missing in them.

I mean I get all of that but the feature is already there for individual documents. I don't understand how it's a problem for bulk but not for individual documents. Put whatever warning or disclaimer you want but I'd like to be able to use that feature without having to go one at a time. This is a deal-breaker for me so I'm hoping this'll be possible somehow.

johnozbay commented 2 years ago

@julianfairfax

Say you've selected 100 documents. 20 of them will have tables missing 20 of them will have videos missing 50 won't have tags, 30 won't have links 10 won't have maths none will have document links etc...

Cryptee can/will only know what's missing ONCE the file is downloaded and decrypted on your device. Until then, Cryptee doesn't know which text-features will be missing (i.e. videos / tables etc), since the contents are encrypted. So we cannot show you a warning in advance like : "features xyz will be missing if you use this format etc" And if your documents are 10mb each, 100 of them will take downloading and decrypting 1GB files into your device's memory, and by that point it'll be too late.

See my point from the original message :

Reason No 4) Client-Side Encryption / No possibility to convert formats server-side

So what do you propose we should do? Before downloading should we just show a warning like : "hey btw you're exporting stuff but basically none of the textual content of your files may be in there with the format you chose..." ?

It's not a problem for individual docs, because : a) they're already open in the editor & decrypted, so Cryptee can see which text-features you might need or missing) b) once you "save as" an individual doc, chances are you'll open the new file up right away. (i.e. "proposal.pdf" etc) and will see what's missing

– but when you bulk export 100 files : they're not decrypted yet, so app can't check for supported features, thus can't show you which export formats could work for your documents or warnings about which won't, leaving us with the only pointless option which is to say "none of your textual-features may be available in this selected format". i.e. we can't even make sure your headers / font-sizes etc are exported correctly if you use txt for example. Or maybe tables are supported, but headers in tables aren't? etc etc. there are endless permutations and combinations.

julianfairfax commented 2 years ago

See my point from the original message :

Reason No 4) Client-Side Encryption / No possibility to convert formats server-side

So what do you propose we should do? Before downloading should we just show a warning like : "hey btw you're exporting stuff but basically none of the textual content of your files may be in there with the format you chose..." ?

I mean, yes, that is what I'd propose. It's obviously not a great solution, but, in my opinion, any feature is better than no feature. It should at least be possible to do this somehow. Maybe add this functionality but require that the user enable in a "beta features" section in the settings, and display the warning there or in both places?

Personally, all of my notes are plain text, or Markdown if you prefer, since the end result for me is the same, so I won't have that kind of issue, at least not at this point in time. Of course others may, and probably will, but that's why it shouldn't be a feature that is available right away with no warning telling you about the issues.

Nonetheless, essentially locking users into using one platform or another, no matter the reason, is, disappointing. You yourselves mention how Apple Notes provides no good way of exporting things, so why should you follow in their footsteps afterwards?

I really like this product, and at some point in the future, once it's more developed and once I have more time and money to dedicate to these things, I could see myself contributing to it, by way of code and by way of a subscription or just a donation, but this feature, in my opinion, is the most important for that.

I honestly was thinking of switching to Cryptee, for a few reasons. I'm already using an open source notes app, and the only reason I could switch to Cryptee is because I can export my notes from that app. It's already going to be enough of a process to "convert" every single one of them in Cryptee, but I'm willing to do that.

However, if I'm then stuck with Cryptee afterwards, or if I have to go through a similar one by one process to export each one of my notes individually, then if at some point I wanted to switch away from it again, I practically couldn't. No matter how good the product, I don't want to be stuck to one service. I'm sure you can understand this.

johnozbay commented 2 years ago

It's obviously not a great solution

It's not a solution though. That was sort of my point.

but, in my opinion, any feature is better than no feature

We do have a bulk exporter feature. And a way to individually export files. it's not a "no feature", it's just not the feature you were looking for, which for mathematical reasons cannot be meaningfully implemented in an encrypted environment like I wrote in detail.

Personally, all of my notes are plain text, or Markdown if you prefer,

Plaintext and markdown are two different media-types by their very definition. Plaintext is RFC3676 and markdown is RFC7763 so it really matters which one. (and I'm intentionally highlighting this to show how much more complex this issue is, and how easy it is to over-simplify it)

Judging by your use-case / description, you can and probably should then use a simpler software that encrypts notes quickly. For the same reason why you wouldn't use Google Docs or Microsoft Word to take quick notes, it would be similarly overkill and perhaps in some ways incompatible for your use-cases to use Cryptee Docs.

You yourselves mention how Apple Notes provides no good way of exporting things

Apple Notes provides no way of exporting things. Not a good or a bad one. none.  – despite having virtually no reason for it. all your notes are already decrypted and on your device readily available – 

And despite mathematical challenges encryption poses on us, We do allow and provide multiple open source formats for exports. Day and night difference right here.

In general, if you google, you'll find something like 10,000 different note taking / rich text editing apps if not more. This is because different people have different note-taking / document-authoring needs, and each one of these apps has its own niche. While it's good that there are options, it also means, we're talking about 10,000 different apps meaningfully inter-operating and being cross-compatible, which is downright impossible.

Of those 10,000 apps, there's probably a handful encrypted ones, and understandably, so far there are no import/export standard for encrypted files or media types yet. So until there is one, we use our own.

--

I completely understand your frustration, but I think your frustration here is misplaced / misdirected. To me it sounds a lot like you're worried about/want/need is a long-term simple note taking application, and you don't need the extra features we offer. In that case, you should probably look into using another simpler note taking application, then none of these would be a problem.

In short, if you need these richer-document-editing features, like tables, like videos, like headers/font-sizes, like paragraph alignment, tagging etc, then you simply can't have a meaningful cross-compatible bulk export in an encrypted environment. Not yet at the very least by a company that offers their services open-source and consequently can't ship support for proprietary formats.

Hoping this makes sense ✌🏻