DataCater / datacater

The developer-friendly ETL platform for transforming data in real-time. Based on Apache Kafka® and Kubernetes®.
https://datacater.io
Other
82 stars 4 forks source link

deployment improvements #28

Closed ChrisRousey closed 1 year ago

ChrisRousey commented 1 year ago

This is a PR to improve the dpeloyments api based on feedback gathered in issue #24

ChrisRousey commented 1 year ago

@flippingbits @HknLof Hello, i have some quick questions regarding deployments error handling. Fabric8 doesn't really throw any exceptions. The only real idea I have for now, is to create a deployment and after creation, check if the deployment exists. If it doesn't, throw an exception.

And since we don't realy know what caused the error, i only return pretty generic error messages with an http 400.

You guys have any ideas or thoughts on this?

flippingbits commented 1 year ago

@ChrisRousey @HknLof What is our take on storing deployment information partially in Postgres? This would ease dealing with such errors, from my point of view.

HknLof commented 1 year ago

I think it is important to forward the message. The message of Fabric8 APIExceptions contain the actual information.

@flippingbits How would storing the deployment help here?

flippingbits commented 1 year ago

I think it is important to forward the message. The message of Fabric8 APIExceptions contain the actual information.

@flippingbits How would storing the deployment help here?

At the moment, we persist all information about DataCater Deployments in the labels of the Kubernetes Deployments. There are rare cases where creating a new DataCater Deployment fails to create a new Kubernetes Deployment without Fabric8 throwing an exception. In this case, our POST /deployments endpoint does not render an error but lets the user assume that the operation has been completed successfully. If the user then lists all DataCater Deployments by calling the GET /deployments endpoint, the new DataCater Deployment does not show up and the user gets confused 🤯

We could fix this either by

(1) actively polling the Kubernetes cluster when creating a new DataCater Deployment and rendering an error if the Kubernetes API call does not complete or the Kubernetes Deployment does not show up after a fixed window, i.e., timeout, which would allow us to return an error in our API,

or by

(2) persisting basic information about the DataCater Deployment in PostgreSQL. This way we never forget that we created a specific DataCater Deployment. We would still keep all state in Kubernetes to avoid maintaining state in both Kubernetes and PostgreSQL. We would then fetch the state on-the-fly from Kubernetes when listing all DataCater Deployments for the GET /deployments endpoint or a specific DataCater Deployment for the GET /deployments/:uuid endpoint. If the Kubernetes Deployment does not exist, we could easily render an error.

This should not provide an answer to the question whether we should use PostgreSQL to persist some information about DataCater Deployments. I just think that - if we go down that route - it helps us to handle this situation.

I hope that clarifies things? If not please shoot questions at me :)

HknLof commented 1 year ago

I can see how storing these kinds of information in our internal db make sense and helps our development process.

From a decision point of view this approach has my vote!

But I want to point out, that the quoted comment below is not a feasible basis to work with in the long run. We need to figure this out at one point. Not now, but somewhere in December, I guess.

Action Item would be to either figure out why no error is reported or if this is K8s intended behaviour for our API call.

There are rare cases where creating a new DataCater Deployment fails to create a new Kubernetes Deployment without Fabric8 throwing an exception. In this case, our POST /deployments endpoint does not render an error but lets the user assume that the operation has been completed successfully.

sonarcloud[bot] commented 1 year ago

Kudos, SonarCloud Quality Gate passed!    Quality Gate passed

Bug A 0 Bugs
Vulnerability A 0 Vulnerabilities
Security Hotspot A 0 Security Hotspots
Code Smell A 0 Code Smells

77.6% 77.6% Coverage
2.2% 2.2% Duplication