Scille / parsec-cloud

Open source Dropbox-like file sharing with full client encryption !
https://parsec.cloud
Other
268 stars 40 forks source link

Investigate how to visualize common files directly in app #8514

Open Max-7 opened 3 days ago

Max-7 commented 3 days ago

The app needs to be able to display some more common files directly, without relying on the apps installed on the computer (typically for web mode). We don't want to download the file, as:

  1. this would mean extracting the file from a secure location
  2. the user could try and update the file, mistakenly believing that it would update it in Parsec

Common file types include:

This only concern file visualization, not editing.

Questions to investigate:

Max-7 commented 2 days ago

Problems

Goal

When using web mode (and probably on desktop and mobile too), we would like to visualize files directly in the app as much as possible, for multiple reasons: more and more apps upload files to their own cloud automatically (Google Docs does this on Android) apps can copy files outside of the enclave (as a cache for example), leaving a file that is not encrypted on the disk in web mode, we don’t want the user to download the file for to avoid extracting the file from the secure enclave

Concerned files are the most common file types:

We’re only concern with viewing a file right now, not editing it.

Constraints

Possible Solutions

Most common file formats

Some file formats are so common that browsers can handle them natively, and even in some cases, include controls:

Viewers for those formats are not very hard to implement and could be done in a reasonable amount of time, they will not be discussed further and we’ll concentrate exclusively on document formats (Office, LibreOffice).

Office formats

Those formats are the most important to handle, as they are the most used in a professional environment.

OnlyOffice

OnlyOffice requires the document to be converted to work. A document server is used to convert a given document (using a tool called X2tConverter) in a format understandable by OnlyOffice viewer.

With Parsec, we cannot use a server, as it would mean sending an unencrypted file to the server (maybe with SGX but it opens another can of worms). We would need to compile the X2tConverter in WASM and include it directly into Parsec.

This compilation process has been done by XWiki and has been reported to be very painful.

The front-end part has been done before. It requires a bit of hacking but it’s fairly light. A video demonstration of how it could run in Parsec V2 is available here. It uses X2tConverter (on Linux) to convert a docx, a simple web server to serve the file, and is loaded inside a Qt WebView. The whole sources for the POC are available here. I don’t know how customizable the UI is, but the default UI could be considered good enough.

Google doc and Microsoft viewers

Google docs can in theory open documents that are not saved on Google Drive as read-only files by embedding it into the page with something like <iframe src="https://docs.google.com/gview?url=http://remote.url.tld/path/to/document.doc&embedded=true"></iframe>

Microsoft offers a similar service with office live <iframe src="https://view.officeapps.live.com/op/view.aspx?src=somedoc.docx"></iframe>

but:

Our own viewer

Libraries exist to open and parse OOXML formats (docx, xlsx) and while they seem mature and well maintained, we would need to implement the visualization part ourselve. It could be feasible as a quick workaround but would have very limited features. It would also add a lot of dependencies to the project, and even more if we want to also include OpenDocument formats. This was the solution used on Parsec Android at the time (we did the HTML conversion ourselves): viewer for docs and viewer for spreadsheets.

Free libraries

Professional solutions

Recommandations

OnlyOffice seems the right solution since we do want to use it for its collaborative edition capabilities later on. Every hour invested in it would not be wasted, both in terms of migrating the X2tConverter tool to WASM, and in terms of integrating the UI. Problem is, it’s hard to evaluate how long it would take to migrate the tool (cross compilation to WASM).

As a quick workaround, implementing a very simple viewer for docx using mammothjs took me about one hour. See the result here. It reads the Parsec file content using the API, converts it to HTML with mammothjs and displays it.

SheetJS for spreadsheets has a similar API, the same thing should be possible.

Estimations

I cannot give any estimate for the port of OnlyOffice's FileConverter to WASM. Depending on how standard the code is and the dependencies it has, it could be just a recompilation, or a multi-weeks struggle.

For the simple viewers without OnlyOffice, we need to make a few choices first:

It can be very quick to implement: the example docx viewer took about an hour, including the routing (clicking a file leading to a new component), the opening of the file, conversion and display. It is of course very rudimentary but it gives a rough estimate.

With a knife under my throat and my family held hostage, you could maybe force me to say that viewers for audio, video, images, texts, docx and xlsx could probably all be done in a functionnal but rudimentary fashion within a week, with another week added for writing tests.

TL;DR;

As we’ll build viewers for other file formats anyway, I recommend adding quick viewers for docx and xlsx using mammothjs and sheetjs respectively. Those two libraries are very easy to use, and the viewers will be very similar to other viewers in their behavior (read a file > convert it > load it into the right HTML tag). They would also be very easy to replace later on.