MatrixAI / js-encryptedfs

Encrypted Filesystem for TypeScript/JavaScript Applications
https://polykey.com
Apache License 2.0
10 stars 3 forks source link

Upgrading @matrixai/async-init, @matrixai/async-locks, @matrixai/db, @matrixai/errors, @matrixai/workers and integrating @matrixai/resources #63

Closed CMCDragonkai closed 2 years ago

CMCDragonkai commented 2 years ago

Description

Several core libraries have been updated and js-encryptedfs needs to start using them.

  1. @matrixai/async-init - no major changes
  2. @matrixai/async-locks - the INodeManager uses a collection of Mutex for mutual exclusion of inodes. This means operations to individual inodes may be blocked. We can switch this with Lock, and in the future optimise with RWLockWriter or RWLockReader
  3. @matrixai/db - the DB now supports proper DB transactions, we should switch the INodeManager API to fully support it
  4. @matrixai/errors - we make all of our errors extend AbstractError<T> and also provide static descriptions to all of them, as well as use the cause chain
  5. @matrixai/workers - no major changes here
  6. @matrixai/resources - since the @matrixai/db no longer does any locking, the acquisition of the DBTransaction and Lock has to be done together with withF or withG

Issues Fixed

Tasks

Final checklist

CMCDragonkai commented 2 years ago

The INodeManager now exposes:

Methods in EncryptedFS that involve creating new inodes include:

All of the above methods except createEncryptedFS needs to handle concurrent operations racing to create the same inode. The createEncryptedFS runs as a singleton with respect to the EFS instance and therefore no concurrency is possible. I've added this into mkdir, mknod, and symlink.

Also I've fixed a bunch of things in mkdir, there was many superfluous functionality in mkdir introduced in #56. Specifically added test for:

      // Asymmetric directory creation
      // the first promise will create dira and dira/dirb
      // the second promise will create dira/dirb/dirc
      await Promise.all([
        efs.mkdir('2/dira/dirb', { recursive: true }),
        efs.mkdir('2/dira/dirb/dirc', { recursive: true }),
      ]);
      expect(await efs.readdir('2/dira/dirb')).toStrictEqual(['dirc']);
      expect(await efs.readdir('2/dira/dirb/dirc')).toStrictEqual([]);

In situations where while loops are used, usually you cannot use with* wrappers because statements like break, continue and return don't have the same meaning within the callback. Therefore it is better to use the procedural methods of iNodeMgr.transaction() and iNodeMgr.inoAllocation(). There's more boilerplate though, as you have to handle the catch and finally clauses, and make sure to pass the exception from catch block to finally block.

CMCDragonkai commented 2 years ago

The copyFile method had a spurious inoDeallocate. This shouldn't be there. The destination being created is managed by _open, and it is not the job of copyFile to remove the inode. However Node FS does say it has no guarantees on the atomicity of copyFile and will try to remove the destination file if it was created for writing. We don't do this currently.

CMCDragonkai commented 2 years ago

It looks like to if we are going to preserve the transaction context when the inode is already created as is now done in mkdir and _open, then the navigateFrom is going to need to be restructured to take a tran object.

Right now because the directory inode is created, and the transaction that is preserved already locks the containing directory and the new directory inode, when the loop iteration restarts and eventually tries to use this.navigateFrom on the new directory inode, this results in a deadlock because navigateFrom attempting to start a transaction on the same directory inode.

Right now a quick fix is to move that section of code outside of the transaction try catch finally block. However this is just because we navigateFrom launches new transactions.

The same problem can occur in _open if concurrently a symlink creation was successful in a race. Then subsequently it may try follow the symlink using navigateFrom.

CMCDragonkai commented 2 years ago

It does seem to make sense that one should be able to preserve the transactional context that navigateFrom will end up using. So it's worth checking that too.

CMCDragonkai commented 2 years ago

With the new tran parameter to navigateFrom, I believe we have resolved the problem of navigateFrom starting multiple transaction contexts. It now all works.

Now it's time to tackle and finish off all the concurrency tests.

CMCDragonkai commented 2 years ago

@tegefaulkes suggested that in the future, preventing race conditions of inode creation might be done by just locking the directory first. That might be simpler than our inode allocation resource management. Not sure atm.

Furthermore, if I move to getting pessimistic transactions implemented in js-db, this EFS may stick with how it is done now (dont fix what aint broke). But pessimistic transactions may make some of the transaction preservation mechanisms easier to understand.

CMCDragonkai commented 2 years ago

For testing...

Some other ideas:

CMCDragonkai commented 2 years ago

It looks like rmdir and _rmdir also is using separate transactions across their calls. They also have a recursive option similar to mkdir.

Discovered a problem with EncryptedFS.readdir and EncryptedFS.rmdir.

If run together very quickly, they both succeed. This is expected as is the case on the real filesystem.

If however the readdir is delayed by a bit, the error can be ENOTDIR or ENOENT. While on the real FS, it is always ENOENT. I want make that the case. This is because during the transaction when rmdir looks up the type for the target, even when it has the lock, it appears the type might be undefined, meaning the entry got deleted from the database.

Since rmdir and _rmdir appears to be using multiple transaction contexts, this could be affecting the concurrency of this operation.

CMCDragonkai commented 2 years ago

Currently _rmdir is only ever entered into if recursive option is true for rmdir. Nothing else calls it except itself.

It seems this was written incorrectly and should be done with a while loop as we have in mkdir as the loop is necessary for recursive directory creation.

On the other hand, since we are going down directories, it sort of makes sense that some recursion would be useful here. But it has the same issue with preserving the transaction context.

Note that recursive deletion should be done piece-meal. That is there's no atomicity for recursive deletion. It should iterate over what it has to delete from the deepest object one a time until it is all deleted. At any point it fails to delete, it can throw an error.

Otherwise, a deletion of a single inode should itself be atomic.

CMCDragonkai commented 2 years ago

In the node fs, the rm call is now available, and the recursive option is actually deprecated in the original rmdir. I can see how this makes sense, the rmdir doesn't make sense to be recursive, since removing anything else would be done differently. But rm would make sense to be capable of being recursive. The rm call isn't being used atm so we will leave that for later.

So I'm going to try for a while loop instead.

CMCDragonkai commented 2 years ago

Because currently navigation via EncryptedFS.navigate is in its own transaction context, when it returns the inode index target as navigated.target, even if this is not null, when we then later acquire a transaction, locking on the navigated.target, after acquiring the lock, the target may already be deleted. This is because while waiting for the lock, it's possible that another call got the lock and deleted the inode.

So right now most of the methods deviates from the normal FS functionality because they will instead say ENOTDIR or EISDIR because they check if the type is correct, whereas if they were deleted concurrently should be throwing ENOENT.

Now this can be solved by just checking if the inode still exists always after we have acquired the lock. And if it doesn't, we throw ENOENT again.

But this code duplication is primarily due to the fact that navigation itself is in separate transaction context. In the future, it might make sense to change navigate to take a pessimistic transaction. So that once we create a transaction and pass it to navigate, it can acquire a lock on the navigated inode index. That way as long as you are navigating you have locked on the target. But that may change the navigate API.

So for now, I'll see how far I can go with just re-raising ENOENT if the inode no longer exists.

This also impacts the case where we are locking on the directory and target due to doing some directory manipulation. In this case, the directory itself may also be gone by the time we have navigated there. But if the directory itself is deleted, then any target inside the directory would also be deleted, so it should result in the same ENOENT error.

CMCDragonkai commented 2 years ago

I've refactored the rmdir with recursive option. Further recursive testing is needed.

But I think alot of the methods that are just checking the target type after acquiring the transaction should also check if the target still exists, and throw an ENOENT after acquiring the lock, as this should help ensure a consistent set of errors during concurrent usage.

CMCDragonkai commented 2 years ago

Further tests required on rmdir recursive as well. It should be checking for search permissions as well.

CMCDragonkai commented 2 years ago

@tegefaulkes Will need you to help finish all the concurrency testing here.

  1. All the methods which currently use transactions (most of them), right now need to have a check for whether the target exists after they have locked it. If the target doesn't exist, they should throw ENOENT. For example see chroot, it checks if the target == null and then also does it again after acquiring the transaction. This needs to be done almost all the other methods. You don't need to do this for _open, mkdir, symlink, mknod. Once this is done, it ensures that when there are concurrent operations where one of the operations ends up deleting the inode, all other calls waiting on locking that inode will end up throwing ENOENT instead of ENOTDIR or EISDIR. Please make sure to list all the methods you apply this change to, as I may some commentary for each of them. More details here: https://github.com/MatrixAI/js-encryptedfs/pull/63#issuecomment-1112931258
  2. Additional rmdir tests should be written that test its recursive option under these situations:
    • When the parent directory lacks the write permission
    • When the parent directory lacks the search permission
    • When the target directory lacks write permission and is empty
    • When the target directory lacks write permission and is not empty
    • When the target directory lacks search permissions and is empty
    • When the target directory lacks search permissions and is not empty
    • When the target directory lacks write permissions, but a subdirectory under the target has write permissions and has files underneath them. The correct behaviour is to match however node's filesystem works. I suggest creating a prototype script that uses fs/promises and running the same test over the real filesystem, observing the side-effect and ensuring that our implementation does the same thing. In my recent changes to rmdir I believe this should be case, but new tests should be written.
  3. More concurrent tests needs to be written. In the tests/EncryptedFS.concurrent.test.ts, I have started refactoring the concurrent tests. The uncommented tests should be currently working. All of the commented out tests needs to be incrementally uncommented and checked against EFS. Most importantly the uncommented tests are all too brittle. They should be testing the expected non-deterministic behaviour. That is if the observed side effects of 2 concurrent operations is the first then second, it is incorrect to test that the result is first then second. Instead the correct test is to allow for both first then second, and second then first. In some cases, one should use the await sleep used in a particular concurrent operation to force one operation to take longer than the other. Make sure to review my uncommented tests to understand how I want the tests to be structured. I find that using Promise.allSettled to quite expressive for these situations. Additional tests should be done (beyond just the commented out tests and incorporated into the describe structure):
    • concurrent mknod
    • concurrent symlink
    • concurrent mixed writes, mkdir, symlink, mknod
    • concurrent mixed readFile and writeFile
    • concurrent mixed read and write on multiple file descriptors
    • concurrent mixed read and write on the same file descriptor
    • concurrent mixed open and deletion
    • concurrent mixed stream read and write
    • ftruncate test according to ./test-fail.ts Correct behaviour should be based off node FS behaviour. Use a prototype script to test how node FS behaves.
  4. INode allocation test, we need to test that the inode allocation is working since I updated this. Basically start the filesystem, write things to it, directories, files, take note of all the inodes allocated (delete some files and directories. Shutdown filesystem, recreate the filesystem (via EncryptedFS.start and EncryptedFS.createEncryptedFS) and ensure that the inode allocation reads from the FS. If the iNodeCounter is correct, then the next inode created should be the lowest deallocated inode index. Use the getAll to test this.
  5. Remove all the test-* files (scaffolding).
  6. Ensure all comments are without . full stop, and don't capitalise the test description or message (this can be done incrementally)
CMCDragonkai commented 2 years ago

I've squashed most of the major concurrency fixes to the last WIP commit. So that commit will be our "concurrency fixing" commit.

tegefaulkes commented 2 years ago

We're using let variable in a few places, This is causing us to loose type information and by extension proper type checking from typescript. I'll fix this up as I work.

For example, using let fd; means fd has type any. when it should be let fd: FileDescriptor | undefined.

tegefaulkes commented 2 years ago

List of methods using transactions.

tegefaulkes commented 2 years ago

What is the syscall parameter of ErrorEncryptedFSError? We don't use it everytime we throw an ErrorEncryptedFSError. It seems to be a string of the method name that the error was thrown in. E.G 'access' when thrown in the access method. But in chroot it is 'readdir'. Is this correct?

CMCDragonkai commented 2 years ago

I think that is purely descriptive, so you can change it as you see fit. We don't actually test for it in expectError. It's supposed to mean the "actual" syscall of the Linux kernel that gets called that had the error. So for EFS, it should just be the nearest EFS method call.

tegefaulkes commented 2 years ago

Implementation of EncryptedFS.read seems wrong. It only holds the transaction for getting the files stat. But doesn't use the same transaction for actually reading the file.

tegefaulkes commented 2 years ago

In the case of rename we have a source and a target path. I'm not sure what to lock for the target path. The target inode will not exist so we can't lock that. Should I lock the parent of the target path? The Dir Inode in this case?

CMCDragonkai commented 2 years ago

The rename might be tricky. You might need to lock all of these things:

  1. The directory containing source and the directory containing target. This could be the same directory. (Make sure not to deadlock!)
  2. If the target is new inode index, it also need to be locked.
  3. If the target is an existing inode, then you must lock that as well.
  4. The source as well.

I have debated with myself on whether to lock the directory then target, or target than directory. I'm not sure what the implications are atm. So you'll see through testing.

tegefaulkes commented 2 years ago

task reference

checklist:

list of methods using transactions
tegefaulkes commented 2 years ago

There seems to be a bug when concurrently streaming writes to a file and truncating said file. Normally I'd expect that the size of a file when using stat should be equal or greater than the contents of the file. I've found a situation where after truncating and writing the file I end up with the stat size of 27 but the contents has a length of 45.

This is likely a problem with the EncryptedFS.createWriteStream implementation. I'll look into it after finishing task 3.

tegefaulkes commented 2 years ago

I've found another possible bug. When using lseek to move the cursor 20 bytes and then writing 4 bytes using write I'd expect the resulting file to have stat size 24 and the contents to have length 24 with the first 20 bytes filled with 0.

But I'm seeing a stat size of 24 and the contents of the file just the 4 bytes written by write. This is another thing I need to look into.

tegefaulkes commented 2 years ago

General ETA for what is left. refering to the above reference. task 3. is hard to pin down. It's revealing small bugs as I progress so the overall time depends on things that are not fully known yet. But first blush i'm expecting up to 1 days to finish the tests and possibly 2 more to fix any bugs. For task 4. should be half a day if that at most. 5. and 6. are trivial and should take no time at all.

So overall;

Should be done within 4 days.

tegefaulkes commented 2 years ago

Cleaned up the commits

CMCDragonkai commented 2 years ago

The expectReason works against the promise settled result, while expectError works against promises. Would have preferred to call them something like expectErrorPromise and expectErrorResult, but we can proceed.

CMCDragonkai commented 2 years ago

Note that js-errors is updated, but after this PR is merged, we want to update all the other packages again because they will be using the updated js-errors as well and we want all of our exceptions to be using the new toJSON and fromJSON serialisation/deserialisation.