Closed CMCDragonkai closed 2 years ago
The INodeManager
now exposes:
withTransactionF
& withTransactionG
- convenience functions wrapping the the lockbox and transactionwithNewINodeTransactionF
& withNewINodeTransactionG
- convenience functions augmenting withTransactionF
& withTransactionG
that allocates the inode using inoAllocation()
resource, and locks it in the lockbox before locking more desired existing inodes and manages the transaction - in EFS's usecase, it only ever needs to create a single inode at a timeinoAllocation
- the ResourceAcquire
version of INodeIndex
transaction
- the ResourceAcquire
version of DBTransaction
Methods in EncryptedFS that involve creating new inodes include:
createEncryptedFS
- uses only withNewINodeTransactionF
at the beginning the root inode gets createdmkdir
- making directories involves using a while loop because of the recursive
optionmknod
- uses only withNewINodeTransactionF
symlink
- uses only withNewINodeTransactionF
_open
- uses while loop due to following symlinks AND also due to O_CREAT
option in which the target is created and still needs to be openedAll of the above methods except createEncryptedFS
needs to handle concurrent operations racing to create the same inode. The createEncryptedFS
runs as a singleton with respect to the EFS instance and therefore no concurrency is possible. I've added this into mkdir
, mknod
, and symlink
.
Also I've fixed a bunch of things in mkdir
, there was many superfluous functionality in mkdir
introduced in #56. Specifically added test for:
// Asymmetric directory creation
// the first promise will create dira and dira/dirb
// the second promise will create dira/dirb/dirc
await Promise.all([
efs.mkdir('2/dira/dirb', { recursive: true }),
efs.mkdir('2/dira/dirb/dirc', { recursive: true }),
]);
expect(await efs.readdir('2/dira/dirb')).toStrictEqual(['dirc']);
expect(await efs.readdir('2/dira/dirb/dirc')).toStrictEqual([]);
In situations where while
loops are used, usually you cannot use with*
wrappers because statements like break
, continue
and return
don't have the same meaning within the callback. Therefore it is better to use the procedural methods of iNodeMgr.transaction()
and iNodeMgr.inoAllocation()
. There's more boilerplate though, as you have to handle the catch
and finally
clauses, and make sure to pass the exception from catch
block to finally
block.
The copyFile
method had a spurious inoDeallocate
. This shouldn't be there. The destination being created is managed by _open
, and it is not the job of copyFile
to remove the inode. However Node FS does say it has no guarantees on the atomicity of copyFile
and will try to remove the destination file if it was created for writing. We don't do this currently.
It looks like to if we are going to preserve the transaction context when the inode is already created as is now done in mkdir
and _open
, then the navigateFrom
is going to need to be restructured to take a tran
object.
Right now because the directory inode is created, and the transaction that is preserved already locks the containing directory and the new directory inode, when the loop iteration restarts and eventually tries to use this.navigateFrom
on the new directory inode, this results in a deadlock because navigateFrom
attempting to start a transaction on the same directory inode.
Right now a quick fix is to move that section of code outside of the transaction try catch finally block. However this is just because we navigateFrom
launches new transactions.
The same problem can occur in _open
if concurrently a symlink creation was successful in a race. Then subsequently it may try follow the symlink using navigateFrom
.
It does seem to make sense that one should be able to preserve the transactional context that navigateFrom
will end up using. So it's worth checking that too.
With the new tran
parameter to navigateFrom
, I believe we have resolved the problem of navigateFrom
starting multiple transaction contexts. It now all works.
Now it's time to tackle and finish off all the concurrency tests.
@tegefaulkes suggested that in the future, preventing race conditions of inode creation might be done by just locking the directory first. That might be simpler than our inode allocation resource management. Not sure atm.
Furthermore, if I move to getting pessimistic transactions implemented in js-db
, this EFS may stick with how it is done now (dont fix what aint broke). But pessimistic transactions may make some of the transaction preservation mechanisms easier to understand.
For testing...
Some other ideas:
It looks like rmdir
and _rmdir
also is using separate transactions across their calls. They also have a recursive
option similar to mkdir
.
Discovered a problem with EncryptedFS.readdir
and EncryptedFS.rmdir
.
If run together very quickly, they both succeed. This is expected as is the case on the real filesystem.
If however the readdir
is delayed by a bit, the error can be ENOTDIR
or ENOENT
. While on the real FS, it is always ENOENT
. I want make that the case. This is because during the transaction when rmdir
looks up the type for the target, even when it has the lock, it appears the type might be undefined, meaning the entry got deleted from the database.
Since rmdir
and _rmdir
appears to be using multiple transaction contexts, this could be affecting the concurrency of this operation.
Currently _rmdir
is only ever entered into if recursive
option is true for rmdir
. Nothing else calls it except itself.
It seems this was written incorrectly and should be done with a while loop as we have in mkdir
as the loop is necessary for recursive directory creation.
On the other hand, since we are going down directories, it sort of makes sense that some recursion would be useful here. But it has the same issue with preserving the transaction context.
Note that recursive deletion should be done piece-meal. That is there's no atomicity for recursive deletion. It should iterate over what it has to delete from the deepest object one a time until it is all deleted. At any point it fails to delete, it can throw an error.
Otherwise, a deletion of a single inode should itself be atomic.
In the node fs, the rm
call is now available, and the recursive
option is actually deprecated in the original rmdir
. I can see how this makes sense, the rmdir
doesn't make sense to be recursive, since removing anything else would be done differently. But rm
would make sense to be capable of being recursive. The rm
call isn't being used atm so we will leave that for later.
So I'm going to try for a while loop instead.
Because currently navigation
via EncryptedFS.navigate
is in its own transaction context, when it returns the inode index target as navigated.target
, even if this is not null, when we then later acquire a transaction, locking on the navigated.target
, after acquiring the lock, the target may already be deleted. This is because while waiting for the lock, it's possible that another call got the lock and deleted the inode.
So right now most of the methods deviates from the normal FS functionality because they will instead say ENOTDIR
or EISDIR
because they check if the type is correct, whereas if they were deleted concurrently should be throwing ENOENT
.
Now this can be solved by just checking if the inode still exists always after we have acquired the lock. And if it doesn't, we throw ENOENT again.
But this code duplication is primarily due to the fact that navigation
itself is in separate transaction context. In the future, it might make sense to change navigate
to take a pessimistic transaction. So that once we create a transaction and pass it to navigate, it can acquire a lock on the navigated inode index. That way as long as you are navigating you have locked on the target. But that may change the navigate
API.
So for now, I'll see how far I can go with just re-raising ENOENT
if the inode no longer exists.
This also impacts the case where we are locking on the directory and target due to doing some directory manipulation. In this case, the directory itself may also be gone by the time we have navigated there. But if the directory itself is deleted, then any target inside the directory would also be deleted, so it should result in the same ENOENT error.
I've refactored the rmdir
with recursive
option. Further recursive testing is needed.
But I think alot of the methods that are just checking the target type after acquiring the transaction should also check if the target still exists, and throw an ENOENT after acquiring the lock, as this should help ensure a consistent set of errors during concurrent usage.
Further tests required on rmdir
recursive as well. It should be checking for search permissions as well.
@tegefaulkes Will need you to help finish all the concurrency testing here.
ENOENT
. For example see chroot
, it checks if the target == null
and then also does it again after acquiring the transaction. This needs to be done almost all the other methods. You don't need to do this for _open
, mkdir
, symlink
, mknod
. Once this is done, it ensures that when there are concurrent operations where one of the operations ends up deleting the inode, all other calls waiting on locking that inode will end up throwing ENOENT
instead of ENOTDIR
or EISDIR
. Please make sure to list all the methods you apply this change to, as I may some commentary for each of them. More details here: https://github.com/MatrixAI/js-encryptedfs/pull/63#issuecomment-1112931258rmdir
tests should be written that test its recursive
option under these situations:
fs/promises
and running the same test over the real filesystem, observing the side-effect and ensuring that our implementation does the same thing. In my recent changes to rmdir
I believe this should be case, but new tests should be written.tests/EncryptedFS.concurrent.test.ts
, I have started refactoring the concurrent tests. The uncommented tests should be currently working. All of the commented out tests needs to be incrementally uncommented and checked against EFS. Most importantly the uncommented tests are all too brittle. They should be testing the expected non-deterministic behaviour. That is if the observed side effects of 2 concurrent operations is the first then second, it is incorrect to test that the result is first then second. Instead the correct test is to allow for both first then second, and second then first. In some cases, one should use the await sleep
used in a particular concurrent operation to force one operation to take longer than the other. Make sure to review my uncommented tests to understand how I want the tests to be structured. I find that using Promise.allSettled
to quite expressive for these situations. Additional tests should be done (beyond just the commented out tests and incorporated into the describe structure):
./test-fail.ts
Correct behaviour should be based off node FS behaviour. Use a prototype script to test how node FS behaves.iNodeCounter
is correct, then the next inode created should be the lowest deallocated inode index. Use the getAll
to test this.test-*
files (scaffolding)..
full stop, and don't capitalise the test description or message (this can be done incrementally)I've squashed most of the major concurrency fixes to the last WIP commit. So that commit will be our "concurrency fixing" commit.
We're using let variable
in a few places, This is causing us to loose type information and by extension proper type checking from typescript. I'll fix this up as I work.
For example, using let fd;
means fd
has type any
. when it should be let fd: FileDescriptor | undefined
.
List of methods using transactions.
What is the syscall
parameter of ErrorEncryptedFSError
? We don't use it everytime we throw an ErrorEncryptedFSError
. It seems to be a string of the method name that the error was thrown in. E.G 'access' when thrown in the access
method. But in chroot
it is 'readdir'. Is this correct?
I think that is purely descriptive, so you can change it as you see fit. We don't actually test for it in expectError
. It's supposed to mean the "actual" syscall of the Linux kernel that gets called that had the error. So for EFS, it should just be the nearest EFS method call.
Implementation of EncryptedFS.read
seems wrong. It only holds the transaction for getting the files stat. But doesn't use the same transaction for actually reading the file.
In the case of rename
we have a source and a target path. I'm not sure what to lock for the target path. The target inode will not exist so we can't lock that. Should I lock the parent of the target path? The Dir Inode in this case?
The rename
might be tricky. You might need to lock all of these things:
source
and the directory containing target
. This could be the same directory. (Make sure not to deadlock!)target
is new inode index, it also need to be locked.target
is an existing inode, then you must lock that as well.source
as well.I have debated with myself on whether to lock the directory then target, or target than directory. I'm not sure what the implications are atm. So you'll see through testing.
checklist:
ENOENT
error. Use chroot
as an example. Ignore this for _open
, mkdir
,symlink
and mknod
. List all methods affected, by extension make a checklist of all methods affected to tick off.rmdir
tests should be added for recursive option for the following conditions.
fs/promises
and running the same test over the real filesystem, observing the side-effect and ensuring that our implementation does the same thing. In my recent changes to rmdir
I believe this should be case, but new tests should be written.~ This test is covered by the 1st two. with lacking write permissions you can remove a directory that is deep enough. With lacking search permissions you can't../test-fail.ts
Correct behaviour should be based off node FS behaviour. Use a prototype script to test how node FS behaves.iNodeCounter
is correct, then the next inode created should be the lowest deallocated inode index. Use the getAll
to test this.test-*
files (scaffolding)..
full stop, and don't capitalise the test description or message (this can be done incrementally)let variable
causing us to loose type information.There seems to be a bug when concurrently streaming writes to a file and truncating said file. Normally I'd expect that the size of a file when using stat
should be equal or greater than the contents of the file. I've found a situation where after truncating and writing the file I end up with the stat size of 27 but the contents has a length of 45.
This is likely a problem with the EncryptedFS.createWriteStream
implementation. I'll look into it after finishing task 3.
I've found another possible bug. When using lseek
to move the cursor 20 bytes and then writing 4 bytes using write
I'd expect the resulting file to have stat size 24 and the contents to have length 24 with the first 20 bytes filled with 0.
But I'm seeing a stat size of 24 and the contents of the file just the 4 bytes written by write
. This is another thing I need to look into.
General ETA for what is left. refering to the above reference. task 3. is hard to pin down. It's revealing small bugs as I progress so the overall time depends on things that are not fully known yet. But first blush i'm expecting up to 1 days to finish the tests and possibly 2 more to fix any bugs. For task 4. should be half a day if that at most. 5. and 6. are trivial and should take no time at all.
So overall;
Should be done within 4 days.
Cleaned up the commits
The expectReason
works against the promise settled result, while expectError
works against promises. Would have preferred to call them something like expectErrorPromise
and expectErrorResult
, but we can proceed.
Note that js-errors
is updated, but after this PR is merged, we want to update all the other packages again because they will be using the updated js-errors
as well and we want all of our exceptions to be using the new toJSON
and fromJSON
serialisation/deserialisation.
Description
Several core libraries have been updated and js-encryptedfs needs to start using them.
@matrixai/async-init
- no major changes@matrixai/async-locks
- theINodeManager
uses a collection ofMutex
for mutual exclusion of inodes. This means operations to individual inodes may be blocked. We can switch this withLock
, and in the future optimise withRWLockWriter
orRWLockReader
@matrixai/db
- the DB now supports proper DB transactions, we should switch theINodeManager
API to fully support it@matrixai/errors
- we make all of our errors extendAbstractError<T>
and also provide static descriptions to all of them, as well as use thecause
chain@matrixai/workers
- no major changes here@matrixai/resources
- since the@matrixai/db
no longer does any locking, the acquisition of theDBTransaction
andLock
has to be done together withwithF
orwithG
Issues Fixed
Tasks
AbstractError
and usecause
chain and static descriptions@matrixai/db
intoINodeManager
, remove sublevel objects and replace with full keypaths.@matrixai/async-locks
to use theLock
class instead ofMutex
fromasync-mutex
@matrixai/resources
withF
andwithG
to replace some of ourtransact
API.FileDescriptor.write
andFileDescriptor.read
should be using a single transaction context for the entire operation: https://github.com/MatrixAI/js-encryptedfs/pull/63#issuecomment-1094910309tests/inodes
EncryptedFS.ftruncate
so that it does not need to iterate over the entire file blocks before truncating, this targets #53~ - do this in #53fileGetBlocks
and see if it correctly implemented - infinite loop solved by incrementing the block count inside the while loop, and added new test toINodeManager.file.test.ts
read sparse blocks from a file
to cover thistests/utils.ts#expectError
has been changed to take the exception class as well, and then theerrno
object to be checkedftruncate
FileDescriptor
having its own lock. This prevents some stream concurrency errorsnavigateFrom
seems racy with multiple transaction contexts being used_open
seems racy with the final FD creation with respect to inode creationmkdir
seems to have strange behavour when the target already exists, need to compare behaviour with real filesystemmkdir
Final checklist