dat-ecosystem-archive / dat-desktop

Peer to peer data syncronization [ DEPRECATED - More info on active projects and modules at https://dat-ecosystem.org/ ]
https://dat.foundation
MIT License
648 stars 69 forks source link

Viewer plugin architecture #505

Open juliangruber opened 6 years ago

juliangruber commented 6 years ago

This is a proposal for an architecture that allows us and users to add Dat viewer plugins which let you view and preview a dat's file contents. The viewers should not require the whole file to exist on disk and instead seek inside the hyperdrive which means a file can be viewed before it has been completely downloaded. For this reason, a Viewer plugin is also sometimes referred to as a Preview. There will be viewers that ship with Dat Desktop, as well as the option to add more from the userland.

Security

Each viewer lives in its own electron <webview>, so is completely decoupled from the main process and only interfaces through an asynchronous API. Not having access to any native apis like the fs module, they shouldn't be able to behave harmfully, except potentially using up all CPU.

Formats

Plugin structure

plugin.zip

All files as listed below will be bundled into a zip archive and then imported through Dat Desktop's plugin panel. Another option for this is to use npm and simply point Dat Desktop to a npm/github path for installing the plugin.

index.js

The main JavaScript file, executed in a sandboxed renderer process. Therefore, no node module use is possible, and if multiple files need to be included, use a bundler like browserify or webpack.

Dat Desktop might expose an API like this to the plugin, for interfacing with the underlying data:

window.seek(from, to, (err, buf) => { ... })
document.body.appendChild(myElement)

All data access will be read only. A plugin like a csv viewer might for example subscribe to its dom's onscroll event and fetch more data accordingly, while an image viewer will progressively load bit information in a different pattern.

package.json

While most file formats map one file to one plugin instance, like csv for example, some formats will map a set of files or folders to a plugin instance, like some point cloud data formats. The plugin therefore needs to be able to register which files and how it wants to register itself, most likely through a package.json field.

For a javascript viewer:

"viewer": {
  "extensions": "*.js*"
}

For a point cloud viewer:

"viewer": {
  "extensions": "*.pcd",
  "group": "pc-data/*.pcd"
}

The above might very likely change as we find out more about those formats. We can start with the single file formats now already though.

In case multiple plugins register for the same file format, we should prompt the user for which plugin to choose.

index.html

As dom can be inserted through JavaScript, for now we'll just provide a document.body with a <main> inside.

aks- commented 6 years ago

I like the idea of installing the plugin by pointing Dat Desktop to a npm/github path

martinheidegger commented 6 years ago

A few things to point out:

  1. It's better to go by mimetype rather than extension because some people use wrong or no file endings (i.e .bak)
  2. Quickly browsing through files is problematic if the viewer needs to be reloaded for every file. It might be good for the viewer to allow with an API to connect and receive data as necessary.
  3. Some viewers might be for a folder rather than a file. In this case the API would need to be allowed to access more than one file of the dat folder.
  4. For security purposes, the viewers should not be allowed to connect anywhere (or load data from anywhere)
  5. Viewers might need to be able to store some state per item: i.e. for a 3D object it might be nice if the viewer remembered the state (camera position) for a continuous browsing experience. I somehow feel that dat-desktop should provide the storage for that.
juliangruber commented 6 years ago

It's better to go by mimetype rather than extension because some people use wrong or no file endings (i.e .bak)

+1 if the mime type can be read, however we might not have enough data yet in which case we should fall back to the file extension

Quickly browsing through files is problematic if the viewer needs to be reloaded for every file. It might be good for the viewer to allow with an API to connect and receive data as necessary.

I wasn't aware "quickly browsing through files" was a requirement. In my mind the UI had a button per file with a magnifying glass and you'd have to dismiss the viewer first before you can open it on another file.

We should work on UI sketches in order to figure out this part of requirements.

Some viewers might be for a folder rather than a file. In this case the API would need to be allowed to access more than one file of the dat folder.

See

While most file formats map one file to one plugin instance, like csv for example, some formats will map a set of files or folders to a plugin instance, like some point cloud data formats. The plugin therefore needs to be able to register which files and how it wants to register itself, most likely through a package.json field.

For security purposes, the viewers should not be allowed to connect anywhere (or load data from anywhere)

Interesting! This could be possible through https://electronjs.org/docs/api/web-request.

Viewers might need to be able to store some state per item: i.e. for a 3D object it might be nice if the viewer remembered the state (camera position) for a continuous browsing experience. I somehow feel that dat-desktop should provide the storage for that.

-> localstorage

martinheidegger commented 6 years ago

however we might not have enough data yet in which case we should fall back to the file extension

hmm, maybe wait until we have enough data for the mimetype?

I wasn't aware "quickly browsing through files" was a requirement.

Both on mac (image preview) & windows (quickview) allow stepping through the folder to see more images; kinda love that and think its a good use-case. Also: Some of those viewers might take quite a while to load.

-> localstorage

The (meta?)storage would be "per file viewed" but dat-desktop might delete the dat, in which case the localstorage of the viewer would keep data for files not existent anymore. It would be good if dat would delete the (meta?)storage per file once a dat is deleted.

juliangruber commented 6 years ago

hmm, maybe wait until we have enough data for the mimetype?

that might work for smaller dats, but what about some with 1000 files? you would need to prioritise downloading the header of each file first. File extension is a good heuristic because most of the time it's actually correct.

Both on mac (image preview) & windows (quickview) allow stepping through the folder to see more images; kinda love that and think its a good use-case. Also: Some of those viewers might take quite a while to load.

The question is, how far do we need to go for the first demo of this feature, to make it already usable. I would also love to have a very great UX for this, but not all is possible right away.

The (meta?)storage would be "per file viewed" but dat-desktop might delete the dat, in which case the localstorage of the viewer would keep data for files not existent anymore. It would be good if dat would delete the (meta?)storage per file once a dat is deleted.

Ok yeah I see your point, sorry I didn't give it enough thought. Let's not consider this in V1 of the design either? This wouldn't break it, it's an optimisation.

martinheidegger commented 6 years ago

that might work for smaller dats, but what about some with 1000 files? you would need to prioritise downloading the header of each file first.

On start of the preview for a file load the first few bytes in advance. Why load all other file headers?!

The question is, how far do we need to go for the first demo of this feature, to make it already usable. I would also love to have a very great UX for this, but not all is possible right away.

Totally with you on that, Just thinking about a proper API that might be upgradable:

window.initPreview({
  onFile: function (file) {
    window.read(file, start, end, (bytes) => {})
  },
  onCancel: () => {}
})

Ok yeah I see your point, sorry I didn't give it enough thought. Let's not consider this in V1 of the design either? This wouldn't break it, it's an optimisation.

Sure, no problem. Just wanted to note it.

juliangruber commented 6 years ago

On start of the preview for a file load the first few bytes in advance. Why load all other file headers?!

There might be multiple plugins registered for the same mime type, we want to show the plugin picker right away and not need another loading state

martinheidegger commented 6 years ago

There might be multiple plugins registered for the same mime type, we want to show the plugin picker right away and not need another loading state

Hmm, not sure about this one. I am generally favorable towards the "loading state" but can also see an UI solution for this.

creationix commented 6 years ago

Call me crazy, but I would love another mode where the viewer get's access to the raw hypercore API for archives that aren't hyperdrive based. There isn't any kind of hypercore-level metadata that we could store this type information is there?

martinheidegger commented 6 years ago

@creationix The viewer links into the tree-view of files (at the moment), as such it expects files to interop with; at what place in the userinterface would you implement that, and why would you implement that the same way? The tree-view itself seems to be "a viewer for a dat with file structures" while the viewers discussed here merely hook into that.

creationix commented 6 years ago

Right, that's why I call it another mode. Currently dat desktop (and most the dat ecosystem) assume the hypercore contains a hyperdrive archive (which in turn references a second hypercore...)

But coming at the problem from the point-of-view of a plug-in system, it's not too wild to just allow a list of viewers that operate at the hypercore API and list them when a hypercore is detected to not be a valid hyperdrive. It would be nice if there was some metadata that made this less guesswork, but I don't know of any.

What currently happens if you paste in the dat url of a non-hyperdrive hypercore archive? Does it just crash?

martinheidegger commented 6 years ago

Never tried :D (but also sort-of out-of-bounds of this issue)

creationix commented 6 years ago

Yep, it crashes hard. image image

creationix commented 6 years ago

At a mimimum, there could be a default viewer that just showed the raw hypercore data as a log of binary data (hex encoded with optional ASCII dump)

martinheidegger commented 6 years ago

Yeah, but that is really two different issues imo.

creationix commented 6 years ago

It's a different feature for sure, but the design constraints are almost identical. You still want hypercore-log viewers to run in a webview and be secure. The only difference is where they hook into the UI and what API is exposed to them.

We can move this to another issue if you'd like.

martinheidegger commented 6 years ago

🤔From the UI perspective they are slightly different issues and from the code perspective too. The reason why to move it into a another issue is to be able to close this issue with a deployment of viewers. (I agree that it is similar)