Closed emmacasolin closed 8 months ago
As per MatrixAI/Polykey#551, we now have a successful connection to a stable running testnet in testnet.polykey.com.
It also makes sense to have a test suite for integrating to testnet.polykey.com here in Polykey repo, but that runs separately to the CI's main pipeline, so it doesn't block anything because the testnet can be a bit flaky. However as we go ahead it should become more stable.
We should start writing some simple tests that can be separately run, and separate from npm run test
script. One way is grouping, or another is just by directory. If with directory it would be important to subdirectory all of the unit tests.
I've changed the docker integration tests temporarily to simply run the image with docker run
. This is so that we can get a github release with the working binary executables, otherwise integration:docker
will fail on some tests.
What needs to be done:
The problem at hand is that the tests are binding the agent socket to the localhost interface. This will need to be changed to an ipv4 supported wildcard interface (::
, 0.0.0.0
) etc. The tests are timing out because the node ChildProcess is unable to kill the agent when in the process is in a state that the program that docker is running is already crashed. The process is crashing in these tests, because they are attempting to connect to the testnet while being bound on a localhost interface, so the agent will not be able to send packets to any globally routable ip addresses. Hence, by specifying the globally routable ip address of a testnet node, it will throw an EINVAL
, noting that a globally routable address is an invalid argument to send
command on a socket that is bound to a localhost interface.
Notes about container network behaviour:
Moving this to Polykey-CLI since integration tests of this sort can only be done as a "process".
Although lighter integration tests should still be in PK the library.
@tegefaulkes this issue can be closed once:
testnet.polykey.com
to then run the integration tests.This can then re-enable integration tests in our integration jobs after deployment. And those nodes should be connecting to the testnet and doing all the tests.
To do this the the docker integration tests need to bind to wildcard address to avoid problems with connecting to the internet - the testnet.
That means for now, this issue is blocked on @amydevs completing #599.
Also I've removed the 2 subissues relating to NAT testing, because those would need to be done in the PKI - not here.
This epic is almost ready to be closed. @tegefaulkes focus on getting the integration tests cleaned up and working. And work with @amydevs to get the API call to testnet.polykey.com/api
to be able to know what the current version is. Work out a spec for what the API should return, and how you would know. As well as timeout - sufficient time for deployment.
To clarify all tests prior to integration tests should be configured to not connect to any network at all, unless it's simulating a local network within the tests.
We're going to streamline how the integration tests work. This is going to be done with the following changes.
testif
utility from the standard tests.
b. integration tests will be in a separate folder structure from the standard tests.
integration tests will focus on connecting to the testnet
.testnet
to be updated. to this end the following changes are to be made.
a. CLI ci job for for triggering the testnet
seed nodes using the new image will be removed. This will be handled by the testnet
infrastructure.
b. A job will be created to wait for the seed nodes to switch to the new version. This will be done by polling a polykey
dashboard endpoint that will either return the when all seednodes have been updated, or list the versions of all the seed nodes. This will need to be speced out.I'm going to make a new issue to track this work and add it to this epic.
There are some ideas for tests coming from the OP spec:
tc
or firewall rules to break the connection to a particular node, and then seeing how PK reacts to that, and also re-enabling a few seconds later.Also our current simulated NAT tests have been disabled for some time:
»» ~/Projects/Polykey-CLI/tests/nat
♖ tree . (staging) pts/7 9:53:42
.
├── DMZ.test.ts
├── endpointDependentNAT.test.ts
├── endpointIndependentNAT.test.ts
└── utils.ts
1 directory, 4 files
These tests can be adapted to a Polykey Infrastructure to test it at scale. It might be more "maintainable" if we do it via AWS rather than simulating it locally which has alot of constraints on the platform.
This is done now except for 1 minor change that still needs to be done. I'll be creating an issue for that as I can't deal with it now.
Specification
We need a suite of tests to cover interacting with a deployed agent, which we can do using
testnet.polykey.io
. These tests need to cover various different connection scenarios:These tests should go into their own subdirectory
tests/testnet
and should not be run with the other tests. They should be disabled in our jest config and should only run when explicitly called (which will happen during the integration stage of our pipelines).Required tests:
tests/testnet/testnetConnection.test.ts
tests/testnet/testnetPing.test.ts
tests/testnet/testnetNAT.test.ts
Additional context
testnet.polykey.io
Tasks
testnet.polykey.io
--log='/regex/'
.~ - not relevant to integration testing, thejs-logger
does support REGEX filtering, but the PK CLI currently doesn't have this option.agent status
command needs to display useful information like the polykey version and other useful statistics like active connections, number of node graph entries etc etc.~--client-host
needs to support host names.~ - this is pending a change to being able to usePolykeyClient
to connect to a host name - which would require using the DNS-SD SRV records. This still needs to be specced out how this would work because in some cases you want to connect to a SINGLE Node, in other cases you are "discovering" a node to connect to, but it's not relevant to this epic.src/config.ts
- from MatrixAI/Polykey#488Emergent bugs
ConnectionReverse.start()
betweenStarting Connection Reverse
andStarted Connection Reverse
. Start by testing locally.