build-trust / ockam

Orchestrate end-to-end encryption, cryptographic identities, mutual authentication, and authorization policies between distributed applications – at massive scale.
https://ockam.io
Apache License 2.0
4.48k stars 560 forks source link

ockam_node testing methodology #2170

Open spacekookie opened 3 years ago

spacekookie commented 3 years ago

The issue of testing came up in #2007 because it touches on a lot of core components. In general ockam_node is mostly tested through example code, not via a robust framework of tests that stress the system to a point where concurrency bugs occur.

We need to change that for the future to ensure that we don't break user code with changes to the core of the runtime.

For now this issue is just a scratchpad for ideas. In the future we'll expand it with more concrete plans.

thomcc commented 3 years ago

Yeah, so 1000% this. I think this will become an issue for external use too, since if ockam is hard to test, code that uses it will likely become hard to test. It also could lead to a non-robust product, which would be... bad for obvious reasons.

I have a few ideas on how to make tests easier to write from the user side, which hopefully will make this a bit easier for us to do internally. That said, most of those would not be running in a fully realistic environment — they'd be within the same process probably. For now that might be fine for internal testing too, but long term we need some tooling that runs our code in multiple processes, using the real transports (even if just over loopback).

SanjoDeundiak commented 3 years ago

I think before stress-testing node, we could start with very basic tests not involving random inputs, high loads, etc. Example: #2007 adds clusters which introduce rules to the order of shutdown of workers. Let's add a test that would spawn regular worker, and create 2 clusters with few workers each. Then shutdown the node and check that the order in which workers were stopped is as expected. That would:

  1. Verify that new feature is working
  2. Show intended behaviour to those who are not familiar with the new feature (sometimes looking at test is much simpler than reading the documentation)
  3. Simplifies PR review, since it's simpler to verify that something is working from seen test pass in CI, than from reading the code