MatrixAI / Polykey

Polykey Core Library
https://polykey.com
GNU General Public License v3.0
31 stars 4 forks source link

Investigate and fix random `NodeGraph` test failures #274

Closed joshuakarp closed 2 years ago

joshuakarp commented 3 years ago

Describe the bug

The tests in NodeGraph can intermittently fail across runs. We'd experienced some similar failures in the past, and I thought I'd resolved them. The past failures were to do with some test utility functions that generate node IDs for a specific bucket index, such that they can be used for dynamic testing. They were previously causing failures by generating node IDs that overflowed into a different bucket index.

If this is still the cause of failure, now that we can use the 32 byte array as a node ID, I can potentially make these test utilities a bit more robust. Previously, they were just incrementing a character in the node ID to force an expected XOR value.

To Reproduce

The test failures only occur randomly. It took me 5 runs of NodeGraph.test.ts to experience a failure in the tests. Some instances of these:

``` [nix-shell:~/Documents/MatrixAI/js-polykey]$ npm test -- tests/nodes/NodeGraph.test.ts > @matrixai/polykey@1.0.0 test /home/josh/Documents/MatrixAI/js-polykey > jest "tests/nodes/NodeGraph.test.ts" Determining test suites to run... GLOBAL SETUP FAIL tests/nodes/NodeGraph.test.ts (5.69 s) NodeGraph ✓ finds correct node address (11 ms) ✓ unable to find node address (6 ms) ✓ adds a single node into a bucket (5 ms) ✕ adds multiple nodes into the same bucket (16 ms) ✓ adds a single node into different buckets (6 ms) ✓ deletes a single node (and removes bucket) (6 ms) ✕ deletes a single node (and retains remainder of bucket) (14 ms) ✓ enforces k-bucket size, removing least active node when a new node is discovered (83 ms) ✓ retrieves all buckets (in expected lexicographic order) (13 ms) ✓ refreshes buckets (2050 ms) ✓ finds a single closest node (3 ms) ✓ finds 3 closest nodes (8 ms) ✓ finds the 20 closest nodes (92 ms) ✓ updates node (6 ms) ● NodeGraph › adds multiple nodes into the same bucket expect(received).toEqual(expected) // deep equality Expected: {"address": {"ip": "6.6.6.6", "port": 6666}, "lastUpdated": Any} Received: undefined 222 | lastUpdated: expect.any(Date), 223 | }); > 224 | expect(bucket[newNode3Id]).toEqual({ | ^ 225 | address: { ip: '6.6.6.6', port: 6666 }, 226 | lastUpdated: expect.any(Date), 227 | }); at Object. (tests/nodes/NodeGraph.test.ts:224:34) ● NodeGraph › deletes a single node (and retains remainder of bucket) expect(received).toEqual(expected) // deep equality Expected: {"address": {"ip": "6.6.6.6", "port": 6666}, "lastUpdated": Any} Received: undefined 305 | lastUpdated: expect.any(Date), 306 | }); > 307 | expect(bucket[newNode3Id]).toEqual({ | ^ 308 | address: { ip: '6.6.6.6', port: 6666 }, 309 | lastUpdated: expect.any(Date), 310 | }); at Object. (tests/nodes/NodeGraph.test.ts:307:34) Test Suites: 1 failed, 1 total Tests: 2 failed, 12 passed, 14 total Snapshots: 0 total Time: 5.726 s, estimated 6 s Ran all test suites matching /tests\/nodes\/NodeGraph.test.ts/i. GLOBAL TEARDOWN npm ERR! Test failed. See above for more details. Determining test suites to run... GLOBAL SETUP FAIL tests/nodes/NodeGraph.test.ts (6.165 s) NodeGraph ✓ finds correct node address (12 ms) ✓ unable to find node address (6 ms) ✓ adds a single node into a bucket (5 ms) ✕ adds multiple nodes into the same bucket (16 ms) ✓ adds a single node into different buckets (8 ms) ✓ deletes a single node (and removes bucket) (6 ms) ✕ deletes a single node (and retains remainder of bucket) (13 ms) ✓ enforces k-bucket size, removing least active node when a new node is discovered (85 ms) ✕ retrieves all buckets (in expected lexicographic order) (14 ms) ✓ refreshes buckets (1996 ms) ✓ finds a single closest node (3 ms) ✓ finds 3 closest nodes (8 ms) ✓ finds the 20 closest nodes (93 ms) ✓ updates node (6 ms) ● NodeGraph › adds multiple nodes into the same bucket expect(received).toEqual(expected) // deep equality Expected: {"address": {"ip": "5.5.5.5", "port": 5555}, "lastUpdated": Any} Received: undefined 218 | lastUpdated: expect.any(Date), 219 | }); > 220 | expect(bucket[newNode2Id]).toEqual({ | ^ 221 | address: { ip: '5.5.5.5', port: 5555 }, 222 | lastUpdated: expect.any(Date), 223 | }); at Object. (tests/nodes/NodeGraph.test.ts:220:34) ● NodeGraph › deletes a single node (and retains remainder of bucket) expect(received).toEqual(expected) // deep equality Expected: {"address": {"ip": "5.5.5.5", "port": 5555}, "lastUpdated": Any} Received: undefined 301 | lastUpdated: expect.any(Date), 302 | }); > 303 | expect(bucket[newNode2Id]).toEqual({ | ^ 304 | address: { ip: '5.5.5.5', port: 5555 }, 305 | lastUpdated: expect.any(Date), 306 | }); at Object. (tests/nodes/NodeGraph.test.ts:303:34) ● NodeGraph › retrieves all buckets (in expected lexicographic order) expect(received).toBe(expected) // Object.is equality Expected: 4 Received: 5 412 | 413 | const buckets = await nodeGraph.getAllBuckets(); > 414 | expect(buckets.length).toBe(4); | ^ 415 | // Buckets should be returned in lexicographic ordering (using hex keys to 416 | // ensure the bucket indexes are in numberical order) 417 | expect(buckets).toEqual([ at Object. (tests/nodes/NodeGraph.test.ts:414:28) Test Suites: 1 failed, 1 total Tests: 3 failed, 11 passed, 14 total Snapshots: 0 total Time: 6.199 s Ran all test suites matching /tests\/nodes\/NodeGraph.test.ts/i. GLOBAL TEARDOWN npm ERR! Test failed. See above for more details. ```

I've found it useful to use the !# in the terminal to repeat the command (repeats everything prior). e.g. npm test -- tests/nodes/NodeGraph.test.ts; !# !# to run the tests 4 times.

Additional context

CMCDragonkai commented 3 years ago

Multiple suggestions to fix this:

Mock documentation is on jest, but because we are using TS, we need to ensure that TS doesn't complain about types. So you would want to mock the nodeid/rootkey that is being provided by the KeyManager.

But you should probably need to fix the node id operations, since char code increment is not correct anyway.

CMCDragonkai commented 3 years ago

Consider changing the timeouts in the code when doing these tests too. Jest also supports timeout mocking that can help too: https://jestjs.io/blog/2020/05/05/jest-26#new-fake-timers

joshuakarp commented 2 years ago

I've added a note that this would be blocked by #261. This is because the current NodeGraph tests are failing on account of the node ID being used in the tests. Depending on the representation to be used in #261, this may have consequences for how this issue should be resolved (however, this is probably unlikely - the tests themselves are already using the 32-byte representation of the node ID). Basically, if there's a choice in which issue should be completed first, #261 should be completed first.

joshuakarp commented 2 years ago

Regarding mocking from this earlier comment https://github.com/MatrixAI/js-polykey/issues/274#issuecomment-952583901, can look into the following for mocking only a single function: https://stackoverflow.com/questions/59312671/mock-only-one-function-from-module-but-leave-rest-with-original-functionality (originally brought up in https://github.com/MatrixAI/js-polykey/pull/283#issuecomment-981569311)

joshuakarp commented 2 years ago

There's some discussion of function mocking in #283:

joshuakarp commented 2 years ago

Seems that mocking with a static keypair has also created problems for the nodes tests, as per https://github.com/MatrixAI/js-polykey/pull/283#issuecomment-982374649. Off the top of my head, not too sure what could be causing this - I thought I was mutating node IDs directly, as opposed to creating new keypairs. But will investigate this when looking into this issue too.

joshuakarp commented 2 years ago

I should also add an exception thrown for when we attempt to calculate the bucket index with a target node ID that matches the current node ID. This currently (albeit, incorrectly) returns -1 for the bucket index, which causes some hard-to-trace problems when debugging (most recently here https://github.com/MatrixAI/js-polykey/pull/278#issuecomment-985212757).

Alternatively, find the calls to this function, and prevent this calculation from occurring at all.

CMCDragonkai commented 2 years ago

What was the resolution to this issue? @joshuakarp @tegefaulkes can you describe how #310 solved this?

tegefaulkes commented 2 years ago

The problems was caused by incrementNodeId creating a nodeId that fell outside of the expected bucket randomly. I removed it and updated generateNodeIdForBucketto take an alt number to make an alternative NodeId that will always fit within the expected bucket.