Clustering / High Availability configuration

matthew-muscat commented 5 years ago

Having set this up on Kubernetes using the apicurio-ui, apicurio-ws and apicurio-api docker images, we're now wanting to move this workload into a high-availability configuration (ie: more than 1 replica of each container).

However, it appears that there's some cluster cache / session issue occurring, as we're seeing authentication with keycloak failing (ie: invalid state code) whenever there's more than a single replica setup.

Is a high availability configuration supported in API Curio? (i'm assuming so based on https://github.com/Apicurio/apicurio-studio/issues/360)
Is there documentation / flags within the docker images to enable configuration for the node names + cluster consensus / replication?

FYI — we've been able to achieve this in keycloak with wildfly's jgroups, using the standalone-ha.xml config file with some minor adjustments to work inside k8s (ie: setting node name based on pod name, using tcp for jgroups, creating a service for jgroups discovery)..

EricWittmann commented 5 years ago

OK great question here. The disappointing short answer to this is "no, not yet". But let me go through it and explain where we're at.

As you mention, there are three Apicurio components that can be scaled independently. I'll go through each of them:

apicurio-api - this is a JAX-RS application that uses Keycloak bearer tokens for authentication and a database for persistence. This component can be replicated/scaled today without any additional configuration. The bearer tokens should work fine (calls to the API are stateless) without any additional cluster configuration, and everything else goes through the DB - so obviously each instance needs to point to the same DB.
apicurio-ui - this is a relatively simple angular application with a handful of util servlets to assist with things like downloading and showing the Documentation Preview. The only problem with scaling/replicating this layer is that it uses Keycloak for authentication, which I believe is stateful. So I guess there are two options to make this work: session affinity and clustering. Some work will need to be done here to figure out the appropriate configuration on this.
apicurio-ws - this is the websocket based editing layer for Apicurio (it is websocket based due to the collaborative nature of editing an API design in Apicurio). If you want real-time collaboration to work, then this layer needs to be configured to leverage the work done in #360 by @msavy . What we have done here is created a messaging based solution (Apache Artemis) that distributes editing command information across multiple instances of the component. So for this to work an Artemis broker instance is required.

Now I think we have all the pieces to make an HA version of Apicurio (the hard part was really the editing layer). But it hasn't yet been put together in a way that will be easy for you to consume. If you're game to be the first HA deployment, I am happy to try to pull all this together for you to try. :)

matthew-muscat commented 5 years ago

Many thanks for the detailed explanation @EricWittmann

Seems like apicurio-ui and apicurio-ws are there areas that would need to be look into — i was in particular running into the keycloak issues with apicurio-ui.

Could apicurio-ui implement an authentication binding similar to keycloak-gatekeeper? This relies on the client-side browser cookie, meaning it's able to avoid needing session affinity / clustering? Is there a reason why the auth needs to be stateful?

apicurio-ws seems like it'll just require exposing the artemis config in the dockerfile — in particular, JGroups can be utilised here, a mentioned in https://activemq.apache.org/components/artemis/documentation/latest/clusters.html

I'm happy to attempt getting this setup, along with contributing with some docs once it's all worked out.

EricWittmann commented 5 years ago

I will admit that I am not 100% sure about apicurio-ui with respect to scaling the Keycloak adapter (and I realize this is the first problem you've hit). Keycloak has some documentation about this here:

https://www.keycloak.org/docs/2.5/securing_apps/topics/oidc/java/application-clustering.html

It seems that, as you pointed out, a cookie based token store may be the answer. That will need some testing as there are one or two place in apicurio-ui that utilize the server-side security context. But that's easy enough for me to test. And to answer your question - there is no reason why auth needs to be stateful.

As for apicurio-ws, I think the following is needed:

A separate Artemis broker must be running
The apicurio-ws instance(s) must be configured to enable the new messaging based editing layer and also configured to use the separate Artemis broker

Currently all of the development and testing of this has been done using Wildfly, but the Docker images for Apicurio all use Thorntail as the runtime platform. I assume that's not a problem, it just hasn't been tested. So we'll need to figure out the appropriate configuration of Thorntail to enable messaging. If you're interested in taking a look at what we have now, there is a decent (although poorly named) integration test for it here:

https://github.com/Apicurio/apicurio-studio/tree/master/test/integration/arquillian

Note the relevant portion of the pom.xml here:

https://github.com/Apicurio/apicurio-studio/blob/master/test/integration/arquillian/pom.xml#L187-L364

And then the actual test here:

https://github.com/Apicurio/apicurio-studio/blob/master/test/integration/arquillian/src/test/java/io/apicurio/test/integration/arquillian/VerifyDistributedSetupTestIT.java

We basically set up a full environment, including database, artemis broker, and two apicurio-ws instances. Then the test creates an API and edits it from two different websocket clients to make sure that messages/events are properly sent to all parties.

Finally, to address your last point - I would of course LOVE any contribution you are willing to make in this area. So based on what you now know about this, let me know what you're comfortable helping out with and how I can help/support you!

EricWittmann commented 5 years ago

Hey @matthew-muscat - I'm back from traveling and working on some other tasks and am now available to assist with this if needed. What is your status? :)

Apicurio / apicurio-studio

Clustering / High Availability configuration #769