apache / pulsar-manager

Apache Pulsar Manager
https://pulsar.apache.org/
Apache License 2.0
515 stars 244 forks source link

HerdDB support in docker image `apachepulsar/pulsar-manager` #289

Open Gurpartap opened 4 years ago

Gurpartap commented 4 years ago

Using HerdDB should be the default and recommended method with the docker image as well. Especially because we all also have zookeeper and bookkeeper already. Introducing postgresql to this mix is worrisome.

Currently, the apachepulsar/pulsar-manager:v0.1.0 image forces a postgresql setup. Or, is there a workaround for using HerdDB with it instead?

These env vars had no effect:

DRIVER_CLASS_NAME = "herddb.jdbc.Driver"
URL               = "jdbc:herddb:zookeeper:localhost:2181/herddb"
USERNAME          = "pulsar"
PASSWORD          = "pulsar"
eolivelli commented 4 years ago

Do you want to replicate the HerdDB database or do you want to store data on a single volume?

The simplest way is to run it in embedded mode without Zookeeper/Bookkeeper.

You can use it cluster+embedded mode easily but I don't know how to do it with pulsar manager docker image.

I will check tomorrow and give you and example of the jdbc url.

eolivelli commented 4 years ago

The URL should be something like this one: https://github.com/apache/pulsar-manager/blob/86b74f53bdee9b698a58e21c2a2de003927e1300/src/main/resources/application.properties#L42

But with zookeeper reference.

When you start the service the first time it will create the first table space.

I think it is better that we provide out of the box support for this configuration in the next release of Pulsar Manager.

I will take a look

@aluccaroni you may be interested in providing an example

sijie commented 4 years ago

@eolivelli can you help with this issue?

eolivelli commented 4 years ago

Yes, as I told above

eolivelli commented 4 years ago

I confirm that using this spring.datasource.url in application.properties allows you to start HerdDB embedded in clustermode

spring.datasource.url=jdbc:herddb:zookeeper:localhost:2181/herd?server.start=true&server.base.dir=dbdata

My problem is now to tell to the docker image to use this configuration. Let me try to play with it, I am not using docker in production but we are starting the service on a VM.

btw we can fix it in next upcoming release

eolivelli commented 4 years ago

It looks like that with v1 image it is not possible to not start PostGRE DB https://github.com/apache/pulsar-manager/blob/v0.1.0/docker/startup.sh

eolivelli commented 4 years ago

I cannot build the docker image on current master.

@tuteng @sijie what is the correct procedure to build the Docker image ?

@tuteng @sijie can you please suggest where to put a new Dockerfile for the new image ? Simply the work is about dropping PostGre SQL and configure the correct JDBC URL.

Gurpartap commented 4 years ago

Do you want to replicate the HerdDB database or do you want to store data on a single volume?

Ideally, I want the pulsar-manager deployment to be stateless.

We should be able to run it anywhere in the cluster while linking it to a replicated database (since we already have zookeeper+bookkeeper for pulsar, using replicated herddb makes a lot of sense).

It would also be beneficial if pulsar-manager image included herddb (= one less component to deploy).

eolivelli commented 4 years ago

Currently Pulsar Manager bundles all of the components of HerdDB and you can really start it with full features.

As said before you could start PM and HerdDB service if you use the standard binaries without docker.

The provided Dockerfile start always PostGre. I am trying to work on such Dockerfiles but I have a local problem with docker/podman.

I can share my code of you want to try to work on it

eolivelli commented 4 years ago

I should add that currently HerdDB uses BK as WAL and ZK as metadata store. You still have to have some additional persistent storage to hold data pages.

In HerdDB community we haven't implemented full diskless persistence simply because all workloads of current users do not require it. We could use BK to store data pages and this feature is not difficult to implement.

Just to recap:

eolivelli commented 4 years ago

@Gurpartap I have drafted an implementation of diskless HerdDB here https://github.com/diennea/herddb/pull/597

Probabily it will be available on HerdDB 0.16.0 (next major release) We should update to that version Pulsar Manager