MatrixAI / js-encryptedfs

Encrypted Filesystem for TypeScript/JavaScript Applications
https://polykey.com
Apache License 2.0
10 stars 3 forks source link

Encryptfs Implementation #1

Closed CMCDragonkai closed 5 years ago

CMCDragonkai commented 5 years ago

Just some comments on the current implementation.

This should be moved to a js-encryptfs eventually.

It is sufficient to use import fs from 'fs';. This is because ES6 modules have arrived in NodeJS proper. The import { default as fs } from 'fs'; is superfluous, it does nothing. We only do this when we want to rename the default to something else when exporting. We also use default when we are in a CJS environment attempting to use an ES6 module's default export.

Flow has a library of default type definitions corresponding to Node JS stuff: https://github.com/facebook/flow/blob/master/lib/node.js#L784

So it should be possible to type fs objects via a: fs.

No need to "create new instance of fs". Node fs is not a class, it's just an object. A simple = fs is enough.

ES6 class methods don't need a function.

So the EncryptedFS upper directory is already in memory, async is not needed there. But the lower directory has to be async. So the bridging between the 2 is interesting. My understanding is that we have a "proxy" architecture, where reads and writes goes directly in-memory to the in-memory vfs, while when going to the lower directory, it would have to do so asynchronously. If it did it synchronously, we'd block the main thread. At the same time, the crypto itself make take time, so async may be better if we eventually need to multithread it as well. So what should be the API exposed from EncryptedFS? Well if it has to satisfy the fs API, you actually need both sync and async APIs. This was done in VFS by doing sync first, and wrapping sync as async. You cannot wrap async back into sync, async is infectious. If we try to do this in EncryptedFS, it would be possible when interacting with fs, but I think it might be slow. Instead we definitely want to use the native async fs API when possible. So I think this means we would have separate codepaths for async and sync.

Another problem is whether both async and sync methods of EFS would be using the corresponding sync or async in VFS? Now async in VFS is a simulated wrapper around the sync version, it's still sync. In most cases, it does nothing but add a bit over async overhead. But in some cases, such as streaming it makes sense to use async. I think for most cases, we just use the synchronous version, then in select places we can use the async method from VFS.

Make sure to check the VFS tests to see what we expect from a lot of the writes and read IO ops. Some have interesting behaviour regarding permissions and file descriptors and also the types they expect.

CMCDragonkai commented 5 years ago

Using async await pattern should be consistent. In that case every function has an implicit promise. So standard returns are actually wrapped as promises. This makes async basically like a monad, where the function body is in the monadic DSL.

CMCDragonkai commented 5 years ago

I'd expand iv to initVector.

CMCDragonkai commented 5 years ago

Change short hand syntax like:

if (!length)
  ...

For

if (!length) {
  ...
}

Relying on significant whitespace is a bad idea in a language like JS. Can be done in Haskell though.

Be consistent with any other whitespace usage like operators. In fact there should be spaces between operators anyway.

CMCDragonkai commented 5 years ago

Use let and const, avoid var.

CMCDragonkai commented 5 years ago

An encrypted write of plain buffer into an cipher stream has to deal with block boundaries of the cipher stream.

This means we have to deal with cases where the plain block boundaries are in the middle of the cipher boundaries.

Then what we can do is separate into 2 operations. The first takes the plain buffer and deterministically calculates the the offsets into the cipher stream. Then extract out the starting and ending cipher blocks and decrypt them. A block merge function now needs to take this starting and ending plain blocks and also the new plain block, and merge these 3 things into a "merge buffer". This function can be called blockMerge and should be generic across alot of functionalities.

Then the merge buffer can be placed back into the cipher stream by converting it into a merged cipher block. We can abstract this further when we see common patterns between this and other functions.

MeanMangosteen commented 5 years ago

This can be closed as currently under development. More specific and granular issues of the design + impl of js-encryptedfs will be documented in the repo's issues.