AMWA-TV / bcp-003-03

AMWA BCP-003-03 Certificate Provisioning in NMOS Systems
https://specs.amwa.tv/bcp-003-03
Apache License 2.0
2 stars 2 forks source link

Reload Configuration Endpoint #5

Open JamesGibo opened 4 years ago

JamesGibo commented 4 years ago

Consolidating discussion about the reasons for a reload configuration endpoint and its API design from (https://basecamp.com/1791706/projects/15357148/messages/92126983) so they can be discussed on the call.

The primary reason for this endpoint at the moment is to trigger the renewal of a devices' TLS certificate at a defined point in time when the device is not in use, but this API could also be used; to trigger software updates, other configuration changes or support for monitoring data export eg. Prometheus.

Issue

  1. Some devices may not be able to replace the existing certificate without affecting the devices primary operation (eg. Having to restart the application to install the certificate leading to loss of control and video)
    • All possible steps should be taken to avoid this, such as being able to reload just the NMOS module (SIGHUP), Apache has a feature to reload a new config without restarting called ‘graceful’ (https://httpd.apache.org/docs/2.4/stopping.html)
  2. Some devices may be unable to generate a new key pair (RSA or ECDSA) without affecting the devices primary operation.
    • All possible steps should be taken to avoid this, such as using key algorithms that are less resource intensive and appropriate hardware acceleration.

Ultimately if a certificate expires, the device should perform the renew, regardless of if this will affect the primary operation of the device.

Practical Examples

Proposal The proposal is to add a new endpoint to the NMOS Specs /reload-config.

The /reload-config endpoint will cause the device to trigger the certificate renewal process of all its certificates, during this time the primary operation of the device maybe affected.

The /reload-config endpoint must have authentication, as the effect of calling it is disruptive to the operation of the device.

Prometheus has a similar feature for triggering the reload of new configurations using an API: sending a HTTP POST request to the /-/reload endpoint https://prometheus.io/docs/prometheus/latest/configuration/configuration/

If the certificate renewal operation is unsuccessful, the device should carry on using the original certificate if still valid and the operation should be re-tried again at an appropriate time.

An automatic or manual check should be perform after issuing the command to check the certificate has been renewed.

HTTP POST 
https://<hostname>/x-nmos/<TBC>/<version>/reload-config
Response:
HTTP 202: The request has been received but not yet acted upon
HTTP 403: Forbidden, client does not have the required access rights to perform this actions

Outstanding Questions:

If it is decided that the endpoint is required, further design decision are required

On the next call I would like to come to a decision as to whether this API is required. Could anyone with practical examples of why this API is required please get in contact and I can anonymously add it to the list of practical examples

lo-simon commented 4 years ago

Would like to clarify that if we decided node will have the /reload-config endpoint, how to perfom differently between exceute reload and read reload status, by just using the POST request?

Is that what you think of the usage? reload config - HTTP POST with a configuration file json body read reload status- HTTP POST with no json body

andrewbonney commented 4 years ago

If we wish to have a 'read' mechanism I'd suggest we just implement a GET in addition.

jonathan-r-thorpe commented 4 years ago

I think the confusion here is how to check that the certificate has been renewed. Simon's interpretation of the proposal was that the same endpoint was to be used to initiate renewal AND check whether the renewal was successful. In this case a POST and GET on the same endpoint for renewal and status? Or a separate endpoint for status?

cristian-recoseanu commented 4 years ago

Hi Everyone!

I think the approach we use will need to accomodate multiple scenarios (systems which have a control system and systems which don't).

One such approach would be for the nodes to host a /scheduled_cert_renewal endpoint where it returns the time when it desires to perform the renewal.

If a client/control system POSTs a new timestamp/time then it will schedule the renewal at that time. If no client/control system is present then the node will continue to perform its renewal and the initial proposed time.

peterbrightwell commented 3 years ago

Discussion on today's call: Does a websocket connection remain open through a renewal? Or an HTTP sequence? What are the real costs involved? OpenSSL-based systems seem to be ok changing a cert, but bare-metal libraries typical for low-power devices less so, and typically need a few seconds to restart. They may not be powerful enough to generate a new keypair while streaming. Possible approach discussed on call, using either a new API or an additional parameter on System API. We will consider longer term solutions (probably v1.1) and meanwhile can recommend sensible advice on updating. Action @JamesGibo.

peterbrightwell commented 3 years ago

Discussed on call. The advice is already in progress and any possible longer-term solution needn't hold up v1.0.

JamesGibo commented 3 years ago

This issue should be added to the v1.1 milestone, as it not going going to included in v1.0 and mitigations for low power devices has been added to the BCP in #11 to prevent performance issue.