Upper directory (in-memory fs) should act as a block cache

MatrixAI / js-encryptedfs

Encrypted Filesystem for TypeScript/JavaScript Applications

https://polykey.com

Apache License 2.0

10 stars 3 forks source link

Upper directory (in-memory fs) should act as a block cache #2

Closed MeanMangosteen closed 3 years ago

MeanMangosteen commented 5 years ago

Similar to a page cache in operating systems, all read and write file operations can be tried against the image of the file contained the VFS first. This would be on a block level basis. If the desired block is not present in the file image inside the upper directory (VFS), a 'block fault' would occur. Now the corresponding block, persisted on disk, would be read, decrypted then populated in the image in the upper dir.

Every subsequent read of the block would simple be retrieved from memory, instead of performing a disk read.

Every write would also populate the upper dir image as well to ensure the block in the upper dir contains the most up-to-date data.

With this measures, blocks in the upper dir image cannot become 'dirty'. So the integrity of every read of loaded blocks in the upper dir is guaranteed.

To know which blocks are currently loaded in a file in the upper dir, a set for each file containing the loaded block numbers can be maintained. Block numbers would only be added never removed.

CMCDragonkai commented 5 years ago

How is this implemented in our inspirations?

robert-cronin commented 4 years ago

On this line of thinking, one also needs to consider how to handle functions like stat/fstat or utimes. Functions that return properties of a directory/file might just look at the upper directory cache. In this case, efs needs to somehow make sure lower and upper dierectories agree on things like access permissions/timestamps/size etc. Well size would be an interesting one, we probably just want to provide the size of the decrypted file since encryption metadata should be transparent to the user.

Another thing to consider is fsync and fdatasync; I think this is more straightforward as in we can just flush the data from upperDir downwards using existing write methods but I have a feeling it's redundant since in both read and write methods, the data is synced between upper and lower directories.

robert-cronin commented 4 years ago

One issue is knowing which blocks have been read into upperDir and which have yet to be read.

The "paging" system can be implemented by maintaining an in-memory index, an internal private object that keeps track of chunk mapping.

One other thing that I don't think is really an issue now (or might not ever be an issue) is concurrency with multiple instances of EFS. If there ever was multiple instances of EFS operating on the same file, we would need to ensure that the in-memory blocks are consistent with those in the encrypted chunk on the lowerDir. This could be done by storing a content hash (of the block) in the encrypted chunk and if the hash has changed, this would mean it has been written by another EFS instance.

I can see this maybe happening in distributed file systems, but it would be easy to circumvent by only sending from upperDir to upperDir using transport level encryption. I don't think it is within scope at the moment.

robert-cronin commented 4 years ago

This has been implemented for write but is not yet utilised in the read method. This will depend on the in-memory index (chunk-mapping) described above

robert-cronin commented 3 years ago

Closing on account of migration to gitlab