MatrixAI / Polykey

Polykey Core Library
https://polykey.com
GNU General Public License v3.0
31 stars 4 forks source link

HTTP status page for Polykey Agent #412

Open CMCDragonkai opened 2 years ago

CMCDragonkai commented 2 years ago

Specification

It would be nice to have an HTTP/s status page for Polykey Agent. The basic idea that it would be possible to use curl, http or a browser to hit the PK agent, and acquire basic status information.

For this status page, it is a public page, so no authentication is necessary to access this status page. It would be similar to calling polykey agent status. But it will only show "public" information, and nothing private.

This status page can be designed similar to other public status pages like for example https://www.githubstatus.com/ or https://status.gitlab.com/

As for the port it uses, we may want to reuse the PK_CLIENT_PORT. But by default HTTP uses 80 and HTTPS uses 443. Therefore PK_CLIENT_PORT and --client-port must be capable of taking multiple ports. Like PK_CLIENT_PORT=80,443,1315, and then PK will bind to all of those ports.

It will be important that the system can differentiate HTTP from HTTPs traffic however and server accordingly. If the client initiates as HTTP, we respond with HTTP, if the client initiates with HTTPs we respond with HTTPs.

If we move to graphql or jsonrpc, we also need to differentiate such an HTTP(s) protocol to regular API requests.

So a protocol demuxer will need to be put in place.

Alternatively the status page is just on an entirely different port which makes it simpler, but increases configuration overhead. That means an additional configuration for status page for both HTTP and HTTPs.

With HTTPs, it will offer its own root cert chain as the certificate data. The cert doesn't need to be signed by a third party certificate authority #154.

As for https://testnet.polykey.io, we would want to show the status of all the nodes in the testnet. One way to do this is to add server somewhere that answers the request but gathers information about the entire cluster #403. Alternatively each node in the testnet cluster can also show status information about the other nodes in the cluster, DHT style. Which would mean that we could route this to the NLBs and avoid having to run another piece of infrastructure.

Additional context

Tasks

  1. ...
  2. ...
  3. ...
CMCDragonkai commented 1 year ago

With the change to JSON RPC, this should break our dependency on the underlying transport protocol. So it would be possible to support an HTTP router for specific RPC handlers. However because it would be expected be HTML, we would also need to be capable of using an alternative format than JSON. But this should be similar to having to deal with protobuf... etc. A single blessed endpoint can be used to provide an HTTP status page.

tegefaulkes commented 1 year ago

Serving the content should be pretty simple. We can generate the HTML using template strings and a template. I'm not sure it's good to serve it over the same socket as the RPC system. Ideally we'd just have a separate TCP socket serving the HTTP content.

CMCDragonkai commented 1 year ago

The RPC system doesn't have knowledge about sockets at all. Starting an RPC server would depend on some event handler (external to the RPC system) to provide ReadableWritablePair objects. So the RPC system wouldn't care where these objects come from.

TCP sockets/websockets/QUIC sockets are all things that could be handing over a ReadableWritablePair to the RPC system.

CMCDragonkai commented 1 year ago

It would nice to show the status of the testnet and mainnet, specifically if they are online.

In the future, we could actually have something that gathers status from all the nodes in testnet/mainnet and show that aswell. I'm not sure if this will be shown on the status page of a single agent, or if it's custom service that we deploy to cloudflare.

Like a testnet/mainnet status.

As for the agent's own status page, that's something that be done with the uWebsocket library because it serves HTTP as well. This appears to be on the same port that uws is binding to. This makes it quite easy, since we reuse the same port.

tegefaulkes commented 1 year ago

As part of #540 , the WebSocketServer is running a node https server. We should use that the serve an HTTP page.

CMCDragonkai commented 1 year ago

During some development on js-quic, we discovered as of now the quiche system which uses boring ssl doesn't yet implement support for multiple X509 certificates. Usually one can present multiple X509 certificates signed by different key algorithms and the server and client can present one or the other depending on negotiation. The work was done here: https://github.com/MatrixAI/js-quic/issues/17.

The main usage for this is to preserve the root key as ed25519, and then deterministically generate other subkeys from it. That way, then we could generate another X.509 cert with a different key algorithm such as one supported by browsers. https://security.stackexchange.com/questions/269725/what-is-the-current-april-2023-browser-support-for-ed25519-certificate-signatu

Node's own TLS does in fact support this.

And now that our client service websocket system js-ws actually uses a regular HTTPS server (not http2 because no ws server implementation is available on http2), then this is possible now. We could serve up an HTTP status page for each agent too!

And this enables #166 to have an HTTP-based RESTful API as an alternative to the WS system if necessary for clients that don't understand or require a simpler integration (possibly allowing us to create vault-compatibility mode or other API compatibility modes as suggested by some users).

CMCDragonkai commented 9 months ago

The https://testnet.polykey.com and https://mainnet.polykey.com provider global status page for the entire network. While a status page here provides individual status over a single Polykey node.

I was also debating whether adding a TUI would be cool to just give a quick terminal "dashboard-like" overview of the current node - but an HTTP status page is likely to be more useful anyway. Limited development resources of course.

CMCDragonkai commented 2 months ago

Apparently it's possible to get .local DNS just zero conf. I like this cause then Polykey agents can all just register their https://.local DNS so that it can be accessed.

IMG20240822222828

That would be pretty cool, it even works across the entire local network. I wonder if this requires using the mdns stack.

CMCDragonkai commented 2 months ago

So home assistant achieves this through the MDNS stack. We can do something like this.

For local area networks, every PK agent should advertise:

<nodeId>.polykey.local

That points back to the host's IP address. The HTTP status page would need to select a random port by default because of port conflicts, of course this could be an additional option on agent start.

This is because unlike home assistant, we're a software, not an hardware platform - although if we sold a PK hardware device, then it would be possible to do it the same way they did it.

That would mean people would be able to access the local HTTP status page with <nodeId>polykey.local:<http-status-port>.

In fact, by combining this with #166 that is also introducing an HTTP-based API bridging the js-rpc, that would re-use the same port for the general HTTP API. The / page would just be the HTML status page.

An HTML based status page would allow us to more easily showcase the current situation of the PK node, far easier than having the CLI based thing, and can even be animated.

So how does this work? Well in relation to our MDNS stack, js-mdns just has to advertise this domain. This is what the home assistant device advertises when using avahi-browser -a -r:

= enp7s0 IPv6 homeassistant [ca46c64a4c0645a0bc2886bdd941e81f] _workstation._tcp    local
   hostname = [homeassistant.local]
   address = [fe80::a8b6:a9f3:d75d:73ff]
   port = [0]
   txt = []
= enp7s0 IPv4 homeassistant [ca46c64a4c0645a0bc2886bdd941e81f] _workstation._tcp    local
   hostname = [homeassistant.local]
   address = [192.168.178.73]
   port = [0]
   txt = []
= enp7s0 IPv4 Home                                          _home-assistant._tcp local
   hostname = [93b6f43baed0455398ee6ad3ed315b07.local]
   address = [192.168.178.73]
   port = [8123]
   txt = ["requires_api_password=True" "base_url=http://192.168.178.73:8123" "internal_url=http://192.168.178.73:8123" "external_url=" "version=2024.8.2" "uuid=93b6f43baed0455398ee6ad3ed315b07" "location_name=Home"]
= enp7s0 IPv6 Home                                          _home-assistant._tcp local
   hostname = [93b6f43baed0455398ee6ad3ed315b07.local]
   address = [192.168.178.73]
   port = [8123]
   txt = ["requires_api_password=True" "base_url=http://192.168.178.73:8123" "internal_url=http://192.168.178.73:8123" "external_url=" "version=2024.8.2" "uuid=93b6f43baed0455398ee6ad3ed315b07" "location_name=Home"]

Without the -r option, it doesn't appear to tell you enough information. But the point is, that you can in fact see the 2 hostnames being advertised on both ipv4 and ipv6: homeassistant.local and 93b6f43baed0455398ee6ad3ed315b07.local.

I believe also this is using the standard default MDNS group address. Whereas PK is actually using a custom group address to avoid conflict with the OS-native MDNS stack. These are related to https://github.com/MatrixAI/js-mdns/issues/8. That's why PK doesn't show up there. So that also means to do this we need to solve https://github.com/MatrixAI/js-mdns/issues/8 and migrate to using OS-native stack so we can use the same group address for local discovery.

Finally while using <nodeId>.polykey.local can allow us an easy way to navigate to any given node on the local network, ideally it would also be nice to advertise polykey.local but only to the same-host network.

This is achieved by restricting the advertisement of polykey.local to the lo interface. Thus for an individual host, using polykey.local would always point to 127.0.0.1. Assuming one runs multiple PK agents in one host, that would mean you can access their http status pages using polykey.local:34589 and varying ports. The MDNS DNS-SD can also advertise the precise port services too.

Finally OSes having a proper "local service browser" would be useful as GUI tool, avahi-discover isn't in Nixpkgs yet, so I can't see anything except using the terminal tool. But one can tell people this when we start the agent too via the agent status/start output.

CMCDragonkai commented 2 months ago

This also has some relationship with the UX of using PK CLI commands which currently relies on --client-host and --client-port OR --node-path to connect to the appropriate agent. So we have 2 explicit ways of connecting to the relevant agent. Of course --node-path is defaulted to the host platform home location too.

@amydevs we had previously talked about MDNS based discovery ways of connecting to an agent running on the local network. The above discussion opens up another way that might be more convenient.

Basically one might be able to instead pass an MDNS based service "name", and then look up the host and port to connect to. Like --client-mdns-domain as an example, and then do that. With the ability to use polykey.local which would be a loopback based mdns domain too.

CMCDragonkai commented 2 months ago

The result might be something like...

Or to make it even more automatic, integrate the MDNS stack to the PK CLI, so that it is possible to do:

To enable https based connections, we also need to solve https://github.com/MatrixAI/Polykey/issues/526.

Well the problem is that browsers don't support the ED25519 crypto. So supporting multiple algorithms means showing multiple certificates, one that browsers are capable of, or generally what HTTPS clients are capable of. Showing multiple certificates based on crypto what client advertises their requirement is possible.

tegefaulkes commented 2 months ago

When starting up a node maybe we can give it an alias. So a node could be found at nodeid.polykey.local, and nodeAlias.polykey.local. We could also look into using this as a shorthand for specifying a node you want to use when doing CLI commands.

CMCDragonkai commented 2 months ago

The NodeId provides distinguishing address. Aliases would need to come later whether it's even useful because it's not a sufficiently generalised to be useful. The polykey.local would be able to be directed to any given local agent.

CMCDragonkai commented 2 months ago

A shorthand would be like short git commits relative to the full hash.