Closed emmacasolin closed 1 year ago
The problem with updating js-logger here (which we need to do in order to update it in PK) is that it also means we need to update the DB, but this also means incorporating all of the new (breaking) changes from 5.0.0 of js-db
. I think I've made the correct changes to the source code, but there are several failing tests. This is to be expected though, since js-db
has undergone some major changes. Most of the failures are to do with concurrency, which makes sense given the focus of most of the recent changes to the db.
Hold on this PR, the js-db update is more pressing first. And we don't need to update the EFS yet. The js-db 5.0.0 is first being updated in PK. EFS can stay on the original js-db.
What you can do is to merge in the non-js-db and non-js-logger changes in first. There's just a bunch of CI/CD stuff.
Can you split up the commit into a separate PR for just the staging and ci/cd and other chores.
Hold on this PR, the js-db update is more pressing first. And we don't need to update the EFS yet. The js-db 5.0.0 is first being updated in PK. EFS can stay on the original js-db.
Ok. EFS won't work inside PK if it's running the old DB though, since the old DB uses the old logger, which conflicts with the new logger used by the new DB in PK, so it will need to be updated here eventually.
Actually you can do it. You just won't be able to pass a "child" logger into the DB that EFS is using. So you can just remove that parameter entirely.
Now that I've updated the logger for all of the other libraries though we can no longer pass a logger through to any of the other libraries from EFS. I think this is only an issue for js-workers
though, which is used only in the benchmarks and in one of the tests (efsWorker.test.ts
), so we could just not pass a logger into those.
https://github.com/MatrixAI/js-encryptedfs/pull/67/commits/ee0b9e2171a9cf89f4905cbff7058e1ba8924f39 now contains all of the CI/CD changes but doesn't update the logger or DB. As such, it doesn't pass a logger through to any of the workers since js-workers
uses a different version of js-logger
. All of the tests should be passing on that commit, but they don't pass on the most recent commit due to concurrency changes from the latest DB.
Once #68 is merged, rename this PR to focus on js-db 5.x upgrade. Logger changes should be eliminated here. Since that will be put into #68. Although once rebased, you can also remove the comment.
@emmacasolin please update to using 1.3.6 of js-workers. It is now compatible with swc.
The quickfix is to swap from using new X();
to new this();
wherever we have static async creators.
This needs to be fixed in EncryptedFS.ts
. Change to:
const efs = new this({
To make it usable. Then swc
should be enabled.
The same fix has to be applied to PK as well under staging
branch. Can you do it at the same time as you do you js-logger updates there.
0084570 fixes #69. There still are failing tests in this PR but it passed when tested on top of staging.
Creation of temporary directory with os.tmpdir()
should still be using path.join
as normal.
@emmacasolin this PR needs to be split up. The js-db changes should be moved into a new PR.
This PR should only focus on synchronising changes, updating the js-logger, and the fix that @tegefaulkes just added.
I suspect that the test failures are coming from the js-db changes. The js-db update should be postponed till later. @tegefaulkes
Actually I was confused, the recent fix should go to #68 and #67. Will rename this PR to suit.
@tegefaulkes can you reset the commit you just pushed?
When this is being worked on all:
// @ts-ignore - version of js-logger is incompatible (remove when js-db updates to 5.* here)
Should be removed from the source, tests and benchmarks as js-logger
will be compatible and we'd have completed the upgrade.
Unlike in Polykey where it has a commit that we can revert, this PR won't have that convenience.
Removed the fix commit from here and pushed it to staging.
Fast-check can be used to combine a number of concurrency tests together. For example, concurrent file writes using a number of methods can be the same test. for these tests the the result matters less than avoiding transaction conflicts. we still want to check that something has been written however.
The INodeManager
is using a LockBox
along side the the database transactions. I think we will want to make use of the transaction locks now.
FileDescriptor
has its own Lock
for locking on cursor updates. This one should be left alone.
looking at the EncyptedFs > write > can copy files
test failure. it has to create the destination file and the destination doesn't exist after creation causing the issue here. Seems that copyFile uses this._open
to open a FD AND create the file. but this is not composed with the wider transaction thus causing the problem.
I'm sure that _open
is used inside of other methods such as appendFile
and open
. I think the problem only happens when _open
has to create the file when opening it. In any case we need to be able to compose it with larger transactions.
LockBox
.INodeManager
is using tran
as the first parameter, swap it to the last parameter in preparation for the decorator usageThe INodeManager
is now using the transaction locks. So far as I can tell, the tran
parameter is already the last parameter where it's used.
I'm not sure what you mean by point 3.
For point 3, I just mean when fds are opened. They are always opened to a existing inode. If a fd is alive, inode is alive. You can delete names, but you cannot delete inodes until all links are gone, that includes fds and other names (aka hardlinks).
So yes when opening a fd, it has to check if it needs to create an inode at the same time. So race conditions can occur here due to concurrent creation. To avoid it, observe how normal fs deals with concurrent opens to the same (non-existent inode), and proceed to mirror the behaviour.
Same applies to directories, since everything on the filesystem is an inode.
The problem here was that copyFile
uses _open
to create a fd pointing to the destination. It's using the create flag so that the file is created if it didn't exist. At the time this was done in a separate transaction so the copyFile
's overall transaction didn't see the file get created resulting in an error.
For the concurrency tests, we should only really check for if 1 method succeeds and others fail. We shouldn't really care what the reason is besides that it was specifically a ErrorEncryptedFSError
and not a transaction conflict. This way I can use the fast-check scheduler to simplify and combine tests while ending up with more robust tests.
An example of this is the concurrent inode creation
test. I can combine the 4 existing tests into a single tests that will check random permutations of open
, mknod
and mkdir
. This should cover permutations that we don't already test.
Is it really worth checking the result of a concurrent write? The expectation is that the writes should complete without error. Should not throw a transaction conflict. What is actually written to the file is technically undefined behaviour so checking what is written feels like unnecessary complexity added to the test. We only really need to check that the number of bytes read is within the expected range.
re-based on staging.
Is it really worth checking the result of a concurrent write? The expectation is that the writes should complete without error. Should not throw a transaction conflict. What is actually written to the file is technically undefined behaviour so checking what is written feels like unnecessary complexity added to the test. We only really need to check that the number of bytes read is within the expected range.
Yes in terms of checking, you can apply a predicate check to the patterns of the resulting concurrent write. This way during fast check, on every iteration they do, which they are changing up the concurrent operation order, we get a good concurrent coverage. If our predicates fail for a certain order, these tests will tell us.
The predicate can be a regular expression. That's how I wrote it already, but I lacked some sort of async scheduler to actually exhaustively check concurrent coverage. Fast check should be capable of exhausting the variations.
Also we don't need to change over all the tests. Just fix any of the concurrent tests and add any additional ones to ensure we have consistent behaviour especially relating to transaction conflicts.
The EFS is just getting more bug free, but it will be more important to work together on integrating the tracing and queueing to address testnet tests starting tomorrow.
This should be good to merge now. Pending review if desired.
Can you rerun the benchmarks and show the comparison here. Since the db has changed, I'm expecting it should be alot faster.
Oh yea you need to provide the type of the lock. So don't worry.
Ok so for all the concurrent tests where you have replaced the test, change to using a regular expression to match the original predicates on the output. I've given some examples above.
For the new tests, this can also be done if you see an easy way to write the regular expression. Prefer this over simply checks like 3
.
Then proceed to merge to staging
.
On the staging branch, regenerate docs and redo the benchmarks.
There's a documented way of doing delaying the actual async functions for the fast check scheduler: https://github.com/dubzzz/fast-check/issues/564. See the relevant commits. I also asked a question upstream to see how it should be done.
The docs also got moved around recently and the relevant commits point to: https://github.com/dubzzz/fast-check/blob/main/packages/fast-check/documentation/RaceConditions.md
There's no inbuilt helper for this.
The docs suggest:
const scheduleCall = <T>(s: Scheduler, f: () => Promise<T>) => {
s.schedule(Promise.resolve("USE THIS AS A LABEL...")).then(() => f());
}
// Calling doStuff will be part of the task scheduled in s
scheduleCall(s, () => doStuff())
It's very similar to what you already did with the gateFactory
.
Using the above method you also don't need a separate factory.
The last interesting thing is fc.scheduledModelRun
. But there is no documentation how to use it. It seems to provide a way of model based testing which requires a sort of state machine model first then you're essentially doing transitions in lock step between the abstract and the real machine. This can be used for EFS, but it would only be done at a very high level considering the entire FS state as a model. This would be a completely different set of tests. Best used for more "consistent" systems (like distributed databases). Not really the EFS in this case. We could apply this to certain stateful domains inside PK though. For example the OCL concepts and ACL.
RIght, just need to squash and checks. Then this will be ready to merge.
Did you check the rest of the tests you replaced to ensure that no predicates are missing? I didn't go through all of them.
Yes, I went through all of the ones I updated.
Cleaned up the commit history.
Just waiting on CI now.
I've asked about the integrating timeouts between fast check and jest: https://github.com/dubzzz/fast-check/issues/3084
Description
Updating CI/CD including:
Related:
Tasks
js-logger
@matrixai/db
to 5.0.1,Final checklist