MatrixAI / Polykey-CLI

Polykey CLI - Open Source Decentralized Secret Sharing System for Zero Trust Workflows
https://polykey.com
GNU General Public License v3.0
6 stars 3 forks source link

Integration Tests for `testnet.polykey.com` #71

Closed emmacasolin closed 8 months ago

emmacasolin commented 2 years ago

Specification

We need a suite of tests to cover interacting with a deployed agent, which we can do using testnet.polykey.io. These tests need to cover various different connection scenarios:

These tests should go into their own subdirectory tests/testnet and should not be run with the other tests. They should be disabled in our jest config and should only run when explicitly called (which will happen during the integration stage of our pipelines).

Required tests:

Additional context

Tasks

Emergent bugs

CMCDragonkai commented 11 months ago

As per MatrixAI/Polykey#551, we now have a successful connection to a stable running testnet in testnet.polykey.com.

It also makes sense to have a test suite for integrating to testnet.polykey.com here in Polykey repo, but that runs separately to the CI's main pipeline, so it doesn't block anything because the testnet can be a bit flaky. However as we go ahead it should become more stable.

We should start writing some simple tests that can be separately run, and separate from npm run test script. One way is grouping, or another is just by directory. If with directory it would be important to subdirectory all of the unit tests.

amydevs commented 9 months ago

I've changed the docker integration tests temporarily to simply run the image with docker run. This is so that we can get a github release with the working binary executables, otherwise integration:docker will fail on some tests.

What needs to be done:

The problem at hand is that the tests are binding the agent socket to the localhost interface. This will need to be changed to an ipv4 supported wildcard interface (::, 0.0.0.0) etc. The tests are timing out because the node ChildProcess is unable to kill the agent when in the process is in a state that the program that docker is running is already crashed. The process is crashing in these tests, because they are attempting to connect to the testnet while being bound on a localhost interface, so the agent will not be able to send packets to any globally routable ip addresses. Hence, by specifying the globally routable ip address of a testnet node, it will throw an EINVAL, noting that a globally routable address is an invalid argument to send command on a socket that is bound to a localhost interface.

Notes about container network behaviour:

Untitled-2023-10-23-0424 excalidraw(6)

CMCDragonkai commented 9 months ago

Moving this to Polykey-CLI since integration tests of this sort can only be done as a "process".

Although lighter integration tests should still be in PK the library.

CMCDragonkai commented 9 months ago

@tegefaulkes this issue can be closed once:

  1. In the CI job for PK-CLI we introduce polling calls to Polykey-Network-Status to ask if the currently distributed/released image version has been deployed for testnet.polykey.com to then run the integration tests.
  2. This means @amydevs you want to expose on the API of PK-N-S that the currently deployed version.
  3. If the version isn't deployed in sufficient time... then that would block the rest of the CI, that's ok. The pipeline will timeout and we would restart the pipeline afterwards.

This can then re-enable integration tests in our integration jobs after deployment. And those nodes should be connecting to the testnet and doing all the tests.

To do this the the docker integration tests need to bind to wildcard address to avoid problems with connecting to the internet - the testnet.

That means for now, this issue is blocked on @amydevs completing #599.

Also I've removed the 2 subissues relating to NAT testing, because those would need to be done in the PKI - not here.

CMCDragonkai commented 9 months ago

This epic is almost ready to be closed. @tegefaulkes focus on getting the integration tests cleaned up and working. And work with @amydevs to get the API call to testnet.polykey.com/api to be able to know what the current version is. Work out a spec for what the API should return, and how you would know. As well as timeout - sufficient time for deployment.

CMCDragonkai commented 9 months ago

To clarify all tests prior to integration tests should be configured to not connect to any network at all, unless it's simulating a local network within the tests.

tegefaulkes commented 9 months ago

We're going to streamline how the integration tests work. This is going to be done with the following changes.

  1. All the standard tests will no attempt connections to any network. They should explicitly be started with no seed nodes.
  2. The integration tests will be separated from the standard tests. a. remove usage of the testif utility from the standard tests. b. integration tests will be in a separate folder structure from the standard tests. integration tests will focus on connecting to the testnet.
  3. integration tests need to wait for the testnet to be updated. to this end the following changes are to be made. a. CLI ci job for for triggering the testnet seed nodes using the new image will be removed. This will be handled by the testnet infrastructure. b. A job will be created to wait for the seed nodes to switch to the new version. This will be done by polling a polykey dashboard endpoint that will either return the when all seednodes have been updated, or list the versions of all the seed nodes. This will need to be speced out.

I'm going to make a new issue to track this work and add it to this epic.

CMCDragonkai commented 9 months ago

There are some ideas for tests coming from the OP spec:

  1. Functionally disconnecting from the testnet seed nodes and reconnecting to it.
  2. Using tc or firewall rules to break the connection to a particular node, and then seeing how PK reacts to that, and also re-enabling a few seconds later.
CMCDragonkai commented 9 months ago

Also our current simulated NAT tests have been disabled for some time:

»» ~/Projects/Polykey-CLI/tests/nat
 ♖ tree .                                                                                                 (staging) pts/7 9:53:42
.
├── DMZ.test.ts
├── endpointDependentNAT.test.ts
├── endpointIndependentNAT.test.ts
└── utils.ts

1 directory, 4 files

These tests can be adapted to a Polykey Infrastructure to test it at scale. It might be more "maintainable" if we do it via AWS rather than simulating it locally which has alot of constraints on the platform.

tegefaulkes commented 8 months ago

This is done now except for 1 minor change that still needs to be done. I'll be creating an issue for that as I can't deal with it now.