MatrixAI / js-encryptedfs

Encrypted Filesystem for TypeScript/JavaScript Applications
https://polykey.com
Apache License 2.0
10 stars 3 forks source link

Preserving file metadata when persisting files #8

Closed MeanMangosteen closed 3 years ago

MeanMangosteen commented 5 years ago

Metadata information for a plaintext file is lost when you persist it. For example when you write a file, efs will zero pad the last block so that it is block aligned. When you decrypt the file, efs does not know whether the zero padding was part of the original file or added in by efs. Since the filesize is never stored, this information is lost. This leads to #7. Storing the metadata becomes even more important when efs starts to provide confidentiality of the metadata. We need some way of recovering the original metadata. This needs to be stored maybe in a header to ciphertext file. The exact format still needs to be decided. But I image this would be prepended to the plaintext before it is ciphered and then persisted.

MeanMangosteen commented 5 years ago

From the file stats shown here https://nodejs.org/api/fs.html#fs_class_fs_stats, while metadata is not encrypted, it looks like the integrity of all attributes is maintained when considering the encrypted file, except for file size.

MeanMangosteen commented 5 years ago

For now, perhaps the best way to go about it, is to store only the file size and worry about storing the other metadata when efs actually offers metadata confidentiality.

MeanMangosteen commented 5 years ago

Storing the filesize has been moved to a new issue #9 which will be attend to more immediately.

MeanMangosteen commented 5 years ago

I was writing the metadata, at the very beginning of each file. Like an ordinary data chunk, it would contain an IV followed by data encrypted using that IV. In this case, the data would be the filemetadata, like the filesize. I also designed so that the encrypted metadata need not be contained in a 'chunk'. The metadata in the header would only be the size of the metadata and the additional IV. The reason for this design was primarily based on conventions that metadata of files are often stored in a header for easy access.

However, since efs' was desgined with the idea that a small unit used in file operations is a chunk, the metadata header not being a chunksize causes special case handling in a lot of the code base. So it would be better just to make the encrypted header a chunksize even, though at moment most of it would be wasted space.

More concerning is the fact that metadata now hijacks the beginning of a file. Since efs is an intermediary between plaintext operations (which contains no encrypted header) and encrypted operations, it mean the write/read position is shifted in the encrypted file. It is not a simple matter of translating the position once for the encrypted file, many operations and calculations in efs do not need the translation but the operations which produce side effects on disk do. The result is adding offsets and special case in torrential amounts. So it might be better to write the metadata at the end? Storing it at the end isn't such a big issue as it would be conventionally, since we can figure out exactly where the metadata information starts since the file is divided in to chunks and we can figure out the the address of the last block using the filesize. This would mean the read/write position does not have to be translated and the read/write position call fall through directly to the encrypted file operations.

CMCDragonkai commented 5 years ago

Based on our discussion, this appears to be done by writing metadata (that is really metadata in the upper fs) as a chunk on the lowerfs. However this metadata isn't just normal posix fs metadata, but also metadata relevant to the encryption system.

Referring to https://en.wikipedia.org/wiki/Extended_file_attributes the posix system does have support for this. However we do not know if the lowerfs will support this. The upper fs when used with Polykey which is js-virtualfs does not have support for xattr atm.

So this metadata chunk is just special data that is read specially and not placed into the file in the upper fs? How is this achieved?

So I agree that metadata should be a normal chunk that is encrypted along with the other file content in the lowerfs. But I'm concerned about how it's robustly distinguished from the plaintext file content that is represented in the upperfs. And then also how js-encryptedfs's attributes is distinguished from standard posix attributes that would be represented on the upperfs as well.

CMCDragonkai commented 5 years ago

And even if upperfs (js-virtualfs) would support extended attributes, does it even make sense to have the encrypted-fs attributes be available in the upperfs, maybe that would leak information? If this is the case, I would actually think these need to be done:

  1. Separate encrypted-fs attributes from virtualfs-fs attributes. The former is encapsulated in encrypted-fs, the latter is represented in the upperfs. Both attributes are encrypted as file content in the lowerfs.
  2. The chunk serialisation of these attributes should use a TLV scheme. I know that you are currently using serialize-javascript but that does not serialise into TLV. We should be serialising into TLV in order to always know exactly what the length is. JSON is a free-text structure, and that isn't even JSON, it's just arbitrary javascript expression. TLV can be based on protobuf as it's pretty common nowadays. Then the attributes can extend beyond just 1 chunk size. But we read from bottom up, and we can cleanly differentiate between file content, file attributes, and encrypted attributes.
CMCDragonkai commented 5 years ago

Here's a possible schema for a container-format that maintains file metadata in an encrypted way:

                               TLV Protobuf            TLV Protobuf

+-----------------------------+-----------------------+----------------+
|                             |                       |                |
|     Plain Text Content      | Plain Text Attributes | EFS Attributes |
|                             |                       |                |
+-----------------------------+-----------------------+----------------+

    The actual file content      Posix Metadata        Metadata used
                                 Extended Attributes   by EFS, may not
                                                       be encrypted

The usage of protobufs is pretty common, so we should be able to represent arbitrary length key values in both the attribute blocks.

EFS attributes shouldn't be encrypted because they are needed without encryption or they don't need encryption. This includes things like merkle hashes for #11 or version number... etc.

robert-cronin commented 3 years ago

Closing on account of migration to gitlab