cartesi / rollups-node

Reference implementation of the Cartesi Rollups Node
Apache License 2.0
20 stars 63 forks source link

Run node on host or single container #26

Closed endersonmaia closed 1 year ago

endersonmaia commented 1 year ago

πŸ“š Context

The current way the off-chain services that compose the cartesi-node solution are released is a container image for each service, and if you deploy each of these services as a separate container (Docker/Kubernetes) everything works just fine.

But when you need to run this directly on the host or inside a single container, you don't have a release available for that.

Why is this problem relevant?

Depending on the environment you need to deploy a cartesi-node, you may have restrictions on how to run multiple services and containers, and make the communication between them.

Although containers are standard, we still need to give support to those that don't use containers.

βœ”οΈ Solution

We could have a container image release with all the services together.

We could have binary releases without the container, so anyone can deploy this in a "plain old server" VM over a VPS or bare metal.

πŸ“ˆ Subtasks

gligneul commented 1 year ago

@endersonmaia I have a few questions regarding the subtasks.

environment variables should be normalized so that they don't conflict between services

Is it better to add a unique prefix to each service? Or do we just need to make sure that the same variable has the same name across all services?

health-checks should be optional

I don't know if I agree with this. Couldn't you just assign port 0 for the health check so the system assigns a random port to it?

logs should be prefixed with the services

Do you have a guideline for the log format?

binary releases at least for linux/amd64;

Is this necessary for the task? Wouldn't be sufficient to release a docker image with all service binaries?

endersonmaia commented 1 year ago

Is it better to add a unique prefix to each service? Or do we just need to make sure that the same variable has the same name across all services?

Good examples are SESSION_ID and REDIS_ENDPOINT, I think they could be the same for every service.

I don't know if I agree with this. Couldn't you just assign port 0 for the health check so the system assigns a random port to it?

I need to know the port to configure this form the outside (a kubernetes manifest, or docker-compose).

But maybe this falls into the same problem of prefixes, we need a INDEXER_HC_PORT and DISPATCHER_HC_PORT to avoid conflicts.

Is this necessary for the task? Wouldn't be sufficient to release a docker image with all service binaries?

This could be another issue, but it's great to be able to download binary releases directly from the GitHub Release page, if you need to run this yourself, without the container stuff.

gligneul commented 1 year ago

This could be another issue, but it's great to be able to download binary releases directly from the GitHub Release page, if you need to run this yourself, without the container stuff.

The services are not meant to be used on their own since you need a lot of configuration, so I think is a very particular use case. We probably should create a separate issue to discuss that.

endersonmaia commented 1 year ago

Another example:

dispatcher

state-server

TX_PROVIDER_HTTP_ENDPOINT and BH_HTTP_ENDPOINT can have the same value.

omidasadpour commented 1 year ago

@endersonmaia Is this a good idea to have all of the services in just one docker image ? We will have lots of challenges like :

1 - How should they work together ?

2 - What if one of the services crashed ? should we restart the whole container or we can just restart that specific service ? maybe we need to use a process management tool like Supervisord

3 - What if one of our services may fork into multiple processes ? (for example, Apache web server starts multiple worker processes)

4 - What about our services dependencies like Postgres and Redis ? should we deploy them inside the same container or we need to deploy them in different containers ? (If we deploy them in different containers, then the user should manage the networking stuffs in docker again )

endersonmaia commented 1 year ago

@endersonmaia Is this a good idea to have all of the services in just one docker image ? We will have lots of challenges like :

There's noting bad about it. :)

1 - How should they work together ?

They should work the same.

2 - What if one of the services crashed ? should we restart the whole container or we can just restart that specific service ? maybe we need to use a process management tool like Supervisord

Each service should be resilient enough and not depend on an external supervision/orchestration. So it's each service responsibility to retry on failing connections and retry with some retry/backoff/timeout logic until finally fail/exit. With good logs explaining the reason for the failure. Also the supervisor/orchestrator should have its own configuration of retry/backoff/timeouts.

3 - What if one of our services may fork into multiple processes ? (for example, Apache web server starts multiple worker processes)

Our services already do that, there's no issue here.

4 - What about our services dependencies like Postgres and Redis ? should we deploy them inside the same container or we need to deploy them in different containers ? (If we deploy them in different containers, then the user should manage the networking stuffs in docker again )

This dependencies can be managed by the supervisor/scheduler used.

I'm experimenting with s6-overlays for the single container approach, someone could try systemd if they need to, and solve this there.


We're not going do stop releasing container images for each service like we do now, we're only going to have other options, a single-container being one of that options.

endersonmaia commented 1 year ago

@gligneul it just occurred to me that all services will see the RUST_LOG configuration, but what if I want to control the log level for just a single service via environment variables?

One suggestion would be to have RUST_LOG by default, but that would be overwritten by $<service>_RUST_LOG or $<service>_LOG_LEVEL.

So, if I want to define the log level globally, I could use RUST_LOG or LOG_LEVEL or CARTESI_NODE_LOG_LEVEL.

In case I want to define an specific service, I could use DISPATCHER_LOG_LEVEL or CARTESI_NODE_DISPATCHER_LOG_LEVEL.

This comment could be transformed into an issue if you want.

gligneul commented 1 year ago

@endersonmaia RUST_LOG is a variable from Rust, I'm not sure if we can change it. And even if we can change it, I'm not sure if we should.

You can already set the log for specific services by specifying the Given Rust module. For instance: RUST_LOG="dispatcher=trace,advance_runner=trace", and so on.

endersonmaia commented 1 year ago

Yeah, that's why I suggest exposing LOG_LEVEL instead of RUST_LOG, and dealing with this internally.

Nice that I can define the service in RUST_LOG, didn't know that, even so, imagine that I want to define different log levels for different services, putting all this in a single RUST_LOG don't feel great.

gligneul commented 1 year ago

I agree that info should be the default. That should be easy to configure.

imagine that I want to define different log levels for different services, putting all this in a single RUST_LOG don't feel great.

That is not much different from specifying multiple variables. You can still set the default one in RUST_LOG. RUST_LOG="...,info".

endersonmaia commented 1 year ago

That is not much different from specifying multiple variables.

I disagree.

I prefer the explicit:

DISPATCHER_LOG_LEVEL="trace"
ADVANCE_RUNNER_LOG_LEVEL="trace"
INDEXER_LOG_LEVEL="info"
GRAPHQL_SERVER_LOG_LEVEL="warn"

Than reading this:

RUST_LOG="dispatcher=trace,advance_runner=trace,indexer=info,graphql-server=warn"

Maybe it's a matter of taste, IDK.

tuler commented 1 year ago

It depends on who is the user. If he is the hardcore user, infrastructure manager, cloud provider, etc, I think it’s ok to be RUST_LOG

For the application developer it will be something very simple like

sunodo run
sunodo run --verbose

and we will decide what to do.

gligneul commented 1 year ago

Maybe it's a matter of taste, IDK.

Yes, it looks better but we would have to implement this logic by hand. The RUST_LOG already works out of the box and provides the functionality that we need, even though it looks a bit ugly.

gligneul commented 1 year ago

We merged the health check improvement to allow the configuration of multiple services. @endersonmaia, is there anything else that you need to be prioritized on our side?

endersonmaia commented 1 year ago

@gligneul nothing that I can think of right now.

I'll test these new health-check options at sunodo/rollups-node.

omidasadpour commented 1 year ago

We merged the health check improvement to allow the configuration of multiple services. @endersonmaia, is there anything else that you need to be prioritized on our side?

@gligneul We could have gracefull shutdown (preStop Hook) config too.

This can help us to Manage service Lifecycle better than now . /cc @endersonmaia

gligneul commented 1 year ago

@gligneul nothing that I can think of right now. I'll test these new health-check options at sunodo/rollups-node.

Ok, thanks!

@gligneul We could have gracefull shutdown (preStop Hook) config too.

Graceful shutdown is important. I will create an issue for it.

torives commented 1 year ago

This issue has spawned a lot of interesting discussions and issues, but it has become quite confusing. I'll be closing it now, but I've created #80 to tackle its original proposal.