3box / keramik

A k8s operator for simulating Ceramic networks
Other
5 stars 1 forks source link

Support Testnet/Devnet/Mainnet Deployment #74

Open qbig opened 1 year ago

qbig commented 1 year ago

Proposed Changes:

  1. Separate the operator for network and simulation to support the independent startup of the network. This is because the simulation operator is intended for local testing and not needed for scenarios where ceramic network is deployed independently. image

  2. Supports dependency configuration for different network types (local/testnet/mainnet):

    1. Replace ceramic's sqlitedb with postgresdb, and support configurable DB username/ name/password. image
    2. Support the ability to turn off ganache and cas. image
    3. Support configurable namespace for the network. image
  3. Add k8s ingress for ceramic to support public access for mainstream public clouds (AWS/GCP) since we need to expose Ceramic & ComposeDB APIs

  4. Migrate Jaeger, Prometheus, and OpenTelemetry from the simulation operator to the network operator to create a universal monitoring and observability solution. image

    1. Expose js-ceramic's metric data to Prometheus.
    2. Provision isolated metric and tracing data for each independent network service for each
  5. Support log stream for ceramic and IPFS nodes.

Clarifying Questions:

  1. Does these changes make sense?
  2. Should These be merge into the current project or forked into a different repo?
qbig commented 1 year ago

@oed @nathanielc

nathanielc commented 1 year ago

@qbig Thanks for this issue. Generally speaking these changes should be merged into this repo. There are one or two changes that might be best served outside of keramik, however in those cases we should design keramik to explicitly enable external configuration of those bits.

Comments on each topic.

  1. Separate the operator for network and simulation to support the independent startup of the network. This is because the simulation operator is intended for local testing and not needed for scenarios where ceramic network is deployed independently.

Did the simulation logic cause issues? Its possible for the operator to service both network and simulation resources. If a simulation resource is never created nothing ever happens. Simulations are not required to define network resources. I don't understand what problem disabling the simulation operator solves. Can you expand on this a bit?

2 i. Replace ceramic's sqlitedb with postgresdb, and support configurable DB username/ name/password.

+1, this would be a very positive change. I imagine this will work by adding new properties to the CeramicSpec and providing the information about how to start up postgres and connect to it.

2 ii. Support the ability to turn off ganache and cas.

This should already be possible following these steps: https://3box.github.io/keramik/advanced_configuration.html

2 iii. Support configurable namespace for the network.

This is a simple change, curious what the use case is for controlling the namespace name?

3 Add k8s ingress for ceramic to support public access for mainstream public clouds (AWS/GCP) since we need to expose Ceramic & ComposeDB APIs

Is this something that can be done outside of keramik? Meaning can ingress be configured independently or does keramik need to know about how ingress is configured? I imagine there are many varying requirements on how users might want to expose their ceramic nodes. If we can design keramik to not need to manage ingress itself then we leave those important decisions to the users themselves. Would something like having keramik add the Service names to its status output and then use those as inputs into the ingress resources work? (I am not a k8s master so there might be a better way to do this). @3benbox Do you have any ideas on how we should manage ingress to Ceramic nodes within keramik networks?

4 Migrate Jaeger, Prometheus, and OpenTelemetry from the simulation operator to the network operator to create a universal monitoring and observability solution.

Agreed, this is another good change to keramik. I'll create a separate issue to track this work as there some complex requirements on the telemetry system.

5 Support log stream for ceramic and IPFS nodes.

Agreed, what needs to be done in Keramik to support log stream?

dbcfd commented 1 year ago

Is this something that can be done outside of keramik? Meaning can ingress be configured independently or does keramik need to know about how ingress is configured? I imagine there are many varying requirements on how users might want to expose their ceramic nodes.

This is actually fairly complex to do properly, and should be managed outside of keramik. This is best handled by terraform or other cloud provisioning tool. Keramik doesn't need to know about ingress at all, only properly expose ports so that ingress can be connected.

qbig commented 1 year ago

Is this something that can be done outside of keramik? Meaning can ingress be configured independently or does keramik need to know about how ingress is configured? I imagine there are many varying requirements on how users might want to expose their ceramic nodes.

This is actually fairly complex to do properly, and should be managed outside of keramik. This is best handled by terraform or other cloud provisioning tool. Keramik doesn't need to know about ingress at all, only properly expose ports so that ingress can be connected.

I think exposing ports(so that ingress can be connected) should suffice at the Keramik level

qbig commented 1 year ago

@qbig Thanks for this issue. Generally speaking these changes should be merged into this repo. There are one or two changes that might be best served outside of keramik, however in those cases we should design keramik to explicitly enable external configuration of those bits.

Exactly! This issue is finding out the parts that could be done via PR and the rest that's best to be built outside of Keramik

qbig commented 1 year ago

This is a simple change, curious what the use case is for controlling the namespace name?

We intend to run different Ceramic node in different namespace for each user so that their resources are isolated.

qbig commented 1 year ago

5 Support log stream for ceramic and IPFS nodes.

Agreed, what needs to be done in Keramik to support log stream? hmmm I am assuming currently each ceramic node is write logs to files? So similar to the SQLite to Postgres config change, is it possible to config how logs are stored for each node or there is a default setting?

hrbustor commented 1 year ago
  1. Separate the operator for network and simulation to support the independent startup of the network. This is because the simulation operator is intended for local testing and not needed for scenarios where ceramic network is deployed independently.

Did the simulation logic cause issues? Its possible for the operator to service both network and simulation resources. If a simulation resource is never created nothing ever happens. Simulations are not required to define network resources. I don't understand what problem disabling the simulation operator solves. Can you expand on this a bit?

From an architectural perspective, Keramik seems to be primarily focused on the network. Currently, the simulation service appears to only be used for testing purposes. Therefore, from an architectural perspective, it would be best to support independent startup of the network.

hrbustor commented 1 year ago

5 Support log stream for ceramic and IPFS nodes.

Agreed, what needs to be done in Keramik to support log stream? hmmm I am assuming currently each ceramic node is write logs to files? So similar to the SQLite to Postgres config change, is it possible to config how logs are stored for each node or there is a default setting?

It should be easy to export logs for corresponding services (js-ceramic/ipfs/postgresdb), for example: https://github.com/kube-rs/kube/blob/main/examples/log_stream.rs.

nathanielc commented 1 year ago

I think exposing ports(so that ingress can be connected) should suffice at the Keramik level

This is already done, Keramik creates a service for the ceramic endpoints. The services are named ceramic-N where N is a number. We could add a label to those services to make them easier to find.

We intend to run different Ceramic node in different namespace for each user so that their resources are isolated.

Makes sense, adding the ability to configure the namespace directly should be straightforward.

@hrbustor Thanks for the details

Therefore, from an architectural perspective, it would be best to support independent startup of the network.

This is possible with the current design.

It should be easy to export logs for corresponding services (js-ceramic/ipfs/postgresdb), for example: https://github.com/kube-rs/kube/blob/main/examples/log_stream.rs.

Thanks for the pointer, that looks good. I think we should likely do this change as part of the change to get open telemetry on the network side of things too. This way we can solve all observability issues together.