airyhq / airy

💬 Open Source App Framework to build streaming apps with real-time data - 💎 Build real-time data pipelines and make real-time data universally accessible - 🤖 Join historical and real-time data in the stream to create smarter ML and AI applications. - ⚡ Standardize complex data ingestion and stream data to apps with pre-built connectors
https://airy.co/docs/core
Apache License 2.0
369 stars 44 forks source link

Start Airy Core Platform from the CLI #828

Closed ljupcovangelski closed 3 years ago

ljupcovangelski commented 3 years ago

In order to facilitate the installation and deployment process, we want to explore the possibility of running the whole Airy Platform from the CLI.

As the Airy Platform currently depends on Kubernetes, this needs to do two things:

Currently we are deploying the platform locally with Vagrant and for running in production we have have charts to start the Kubernetes applications. Installing with the CLI should unify these two installation processes and simplify the installation instructions.

Example command:

airy init
airy start --type local --kafka yes --postgres yes --domainname mycore.com --config ./airy.yaml

For the first step (starting a new Kubernetes cluster) it would be much simpler if we can run it on docker or containerd, so that we don't depend on VirtualBox and Vagrant. This approach will simplify the local installations on the machine, as we would depend on fewer local installations. Should be used only if we want to achieve very fast start and for some simple testing. Having the Kubernetes cluster in a VM (example Vagrant or Minikube) will be more stable.

For the second step (deploying the Airy components), we can take few approaches: create Airy CRD, deploy everything with helm as we are doing now from a pod running in docker (so that we don't install helm locally), talk directly to the kubernetes cluster with Go and create the resources.

lucapette commented 3 years ago

As a first comment, I'll leave out the discussion of the actual command name and its flags( while I like airy start, I'm a lot less convinced about the flags UI, the names, the granularity). I suspect this issue will turn into a small milestone so I will keep my comments at that level of detail for now.

I like the idea of depending on kubernetes only (I don't think depending on docker/containerd is enough or maybe even correct). It would indeed bring some benefits. But if we're doing this, we should explore the implications a little more:

So while in theory it sounds great to get rid of vagrant, we'd have to solve many problems by doing this. And I think there's not a strong argument in the direction "not using vagrant will simplify local installation". It seems to me, we'd need to do a lot of things there's no need for us to do? The airy start command keep keep its dependency on a kubernetes cluster and we'd still improve quite a bit the usability and the programmability of the system. Maybe the "right" direction is incorporating into airy start --type local (I think the flag name is bad but using it for the sake of argument) part of the work the bootstrap process does? airy start could rely on the concept of "image" (so we can simplify our way through cloud providers as well)

One thing I'm also not sure about is if we're assuming that introducing the airy start command also means getting rid of the bootstrap script. If so, that also has implications: How do we choose the version? The script now depends on the fact it's inside the repo.

As for the second step, I would like to dig deeper into the options. I agree the idea of not depending on helm is appealing. But I can't say I understand what "creating Airy CRD" means in practise (also would that mean the CLI would still need to depend on the GIT repo? That's actually a very general question we should settle). Gut feel makes me think we should discard the last option as I think that would imply the airy cli knows everything (which is a good advantage of the helm approach: whatever is in the chart give or take the config gets there automatically)

ljupcovangelski commented 3 years ago

I'd like to wrap-up of the conversation that we had yesterday, as well as try to answer on some of the questions that have arisen.

In terms of the name of the flag, probably create suites better then start, because the command will actually create an instance of Airy Core, while start and stop can be used for managing the state of the instance.

The question whether the local instance will run on top of docker or Vagrant or VirtualBox we would still like to leave open for the moment, until we have all the data points and requirements. One mandatory requirement is that Airy Core must run on top of Kubernetes.

As the deployment to different cloud providers becomes more important, we are strongly thinking about reverting to the initial plan of having Airy Core packaged in an image, instead of starting an empty Alpine or Linux instance and provisioning it on the fly. This will also facilitate the local deployment with the cli command, as we would probably need to get rid of the bootstrap.sh script and probably all of the bash provisioning scripts.

Answers to some of the questions:

For us to be able to fully picture the project and create the proper issues, these are the first open questions from my perspective:

chrismatix commented 3 years ago

Thanks for the write-up @ljupcovangelski! Is the port not being persistent for a local docker installation a problem? As far as I can tell we could do with localhost setup for the local version because no routes intersect. We would only have to add a prefix path for the UI container.

ljupcovangelski commented 3 years ago

Good point @chrismatix . I was thinking in a sense that we cannot force a particular port for the loadbalancer, because the user might have something running on port 80 or port 8000 already. But we can make this configurable for the user and then when one runs:

airyh create --type local --port 8000

We create a loadbalancer container listening on port 8000. All the ingress routes then should be

localhost:8000/ui
localhsot:8000/api
localhost:8000/chatplugin
localhost:8000/webhooks
...

Best if we can only configure the routing inside the ingress (remove the prefix-path from the request we send to the individual services), so that we don't need to change the code in the components.

ljupcovangelski commented 3 years ago

Few things we need to consider if we switch to the image deployment concept:

Also something that comes to my mind is that perhaps we should do this step by step, so that we can meassure the effect on the user experience better:

ljupcovangelski commented 3 years ago

Possible options for running Kubernetes locally:

lucapette commented 3 years ago

Also something that comes to my mind is that perhaps we should do this step by step, so that we can meassure the effect on the user experience better:

* Strip the provisioning of the prerequisites and the components as much as we can.

* Introduce an image, but with the same `bootstrap.sh` script

* Get rid of the bootstrap script in favour of the `airy create` CLI option.

I love this suggestion and I think the outcome of this issue will be a milestone that will guide us through this process.

I'm not sure I understand how moving "back" to an image approach plays with running kubernetes locally. Isn't the assumption that we "run an image" locally via virtualbox? (Virtualbox being an example... but also the most reasonable solution imo).

As I understand it, we agree on:

Now some clarification/further question

Almost there I think. Very exciting work! After another round of questions/answers, I feel we will be ready to create the milestone (it's also my personal hope :D)

ljupcovangelski commented 3 years ago

I'm not sure I understand how moving "back" to an image approach plays with running kubernetes locally. Isn't the assumption that we "run an image" locally via virtualbox? (Virtualbox being an example... but also the most reasonable solution imo).

If we have made a decision for switching to Virtualbox/Vagrant image - then I think we should stick to the existing k3s, if there is a way to change the kubernetes certificates when we run a new instance of the image (after the image and the kubernetes cluster inside are created).

As I understand it, we agree on:

airy create is a better name
introducing images so airy create can refer to that to, well, create an airy core instance
eliminate the bootstrap.sh all together (even though we may want to do this in steps)

I agree

the CLI already bundles the current version so in theory airy create can refer to that to fetch the right images

I also think this is the best approach

I'm not sure what's the concrete proposal for the networking. I'm working with the assumption that all the base images we will produce (virtualbox, aws, gcloud etc..) will use k3s which means we'll have access to treafik and can "save" what we already have (I see no good reason to change this but maybe I'm missing something). It's still not clear to me what we're proposing for the hosts when you're creating a local instance.

Correct. If we use an image with k3s, the traefik ingress that comes with it - would be the best option

Do we package the docker images in the box or not I would say definitely not for the airy components. We have the controller that can start/stop things accordingly to a config and there's no reason to make the base image bigger than it should be. But I do agree we should probably have kafka/schema registry/redis already backend in. They're "hard requirements" for the system after all (you could make an argument for redis... but the image is pretty small)

:+1: Should we have two images: one for local deployment (with kafka, postgresql, redis bundled) and also another one for when people decide to run Airy core with their own setup of the prerequisites?

Currently with the provisioning we also introduce a way we start our applications I think we could get away with an init script in the image itself. init-container may be too hard. After all, we want to start airy components once kafka is there. Maybe we can even make the controller responsible for this? (as in, wait for kafka to be there to start things). It would "just work" as we'd start kafka and the controller and the controller will start the rest once kafka is there.

I vote for the init-container option, because keeps the image clean and the problem is purely put in ops domain how do we want to run that image. Also would be visible from kubectl get pods: If kafka is not there, the pod will be stuck in the Init stage. Perhaps we will also have some other logic in the future, when an app waits for two services and a particular config-map to be there?

How do we make Packer or the way we build the image part of the CI/CD pipeline? that is also something I'm not sure how to picture but very central to the problem we have. The "real question" to me is where would we host these images? I can picture the pipeline (have an action that runs packer). Maybe s3 like we do with the CLI

Yes, s3 is good I think.

pascal-holy commented 3 years ago

I would like to keep the local and cloud deployment as similar as possible and this Vagrant provider could be the solution (also exists for AWS). Because all the cli has to do to switch between local and cloud deployment is to pass a couple arguments to vagrant up.

The vagrant box(image) itself can also stored in the Vagrant cloud for free.

ljupcovangelski commented 3 years ago

We are wrapping-up this issue after a final discussion, with the following design decisions:

We have two major steps:

After this, we can get rid of the scripts/bootstrap.sh script.

This is the resulting milestone: https://github.com/airyhq/airy/milestone/22