plan: discuss rabbit future state/plans

v1k0d3n commented 7 years ago

Kubernetes Version (output of kubectl version): N/A Helm Client and Tiller Versions (output of helm version): N/A Development or Deployment Environment?: N/A Release Tag or Master: N/A Expected Behavior: N/A What Actually Happened: N/A How to Reproduce the Issue (as minimally as possible): N/A Any Additional Comments:

Background: I think we need to discuss how to appropriately handle dependencies again. to this point, any deployment waiting in an init state was caught on either an init-container, or a job within the single service-level chart. we now have changes within our rabbit chart that will wait in an init state because of an undeclared dependency outside of the rabbit chart (etcd; another chart). if and operator deploys rabbit, it will be blocked until the separate etcd chart is deployed. the operator must know in advance that rabbit requires etcd, and this is fundamentally different from our other charts. it went undocumented as well, so users really need to figure this out on their own (as it stands today). this challenges how we're handling dependencies for the project. we'll also need to address endpoints similarly to our other charts (as pete works through these).

i'm concerned with carrying over this mixed message into openstack proper without a formally documented roadmap to RabbitMQ 3.7.0. we should appropriately document the edge cases as part of the PR, along with our plans to bring them back into the fold with our vision [dependency handing].

also, one other smaller consideration. if we're pulling large portions of source from upstream projects we need to ensure that those sources are dedicated to maintaining the code (dockerfiles/images, etc). for example, what would happen if this code/image were suddenly dropped by the maintainer? we don't want to own extra debt in our repo, but strike a balance with the "what if" scenarios of potential changes to upstream. we can handle this with documenting our sources (and also keep track of licensing; which could potentially be more concerning).

cc: @intlabs @alanmeadows @ss7pro

intlabs commented 7 years ago

In addition to the points @v1k0d3n raises above, this raises issues that I can see, and we should discuss at our next community meeting:

Documentation needs to be updated as operations and features change: The change in behavior, installation, and operations of rabbitmq were not noted in either the development or deployment docs.
This chart did not correctly set the password for use with the existing charts, meaning that it does not work out of the box. This will be fixed when gating when is in place, but I'm very concerned that it was not tested in (at least) minikube or on a real cluster prior to merging. This has caused issues like https://github.com/att-comdev/openstack-helm/issues/239. This should have been caught as part of the review process, and would definitely have been caught if the documentation had been updated and tested prior to merging.
Both of the above issues means that a user new to the project, following the documentation to the letter (working from master, and not 0.1.0), would not have been able to create a functional OpenStack-Helm cluster.
There are large portions of code pulled from another code-base, without proper attribution. This is AFAIK in conflict with sections 5&7 of the OpenStack Contributors Agreement, and would also conflict with the CNCF Contributors Charter (Contributors Agreement Sections 5&7, as well as the Developer Certificate of Origin sign-off), meaning that this work may potentially not be suitable for import/migration to either of those locations at a later date.
The image used cannot be rebuilt easily and comes from a relatively unknown repo: https://github.com/openstack/fuel-ccp-rabbitmq/blob/master/docker/rabbitmq/Dockerfile.j2, meaning that we lose one of the project ideals of being image/runtime agnostic.
Though it was stated that there were potential race condition issues in the original PR with a DNS-based approach, these were not properly evaluated in the open, meaning that it is unclear whether the best solution was selected, or the most convenient.
Additionally, the deployment does not specify the ports exposed by the service, this is bad practice and will cause us issues when implementing network security policy. This is a minor nit in comparison with the above, but also should be addressed.

ss7pro commented 7 years ago

Really good analysis. Evaluating clustering backends for autocluster plugin we (Intel) have implemented native support for k8s (https://github.com/aweber/rabbitmq-autocluster/blob/master/src/autocluster_k8s.erl) but we withdrawn our support for that as Mirantis did full investigation of clustering methods. Result of their research is reflected in multiple improvements to etcd clustering backend, where the most important part is to avoid race condition on startup (leader election with lock) and split brain avoidance.

I would simply suggest to add rabbitmq with autocluster plugin to kolla so we will have this solution in well recognizable place, but IMHO there's no better rabbitmq solution for clustering on top of k8s at the moment. The one developed by Mirantis runs on multiple Intel production cluster without any issues (although we still use stackanetes there).

On #292 issue we discover that later when deploying openstack-helm in our clusters, but simply run out of time to upstream fix for that.

I don't really understand concept of proper attribution, what do you mean by that ? README file with credits to Mirantis repo ?

att-comdev / openstack-helm

plan: discuss rabbit future state/plans #246