Closed camann9 closed 9 years ago
Here's the architecture of a Tyk setup (roughly), the reload signal is a fanout signal that hits multiple host managers:
To answer your questions:
tyk-admin-api-XXX
- this is a user session in the dashboard, not used by Tyk at all.apikey-XXX
- this is an API key that will give access to a stored API (depending on settings stored under that key)The things you see in mongo are all mainly related to the dashboard:
tyk_analytics_users
: User logins for dashboard (not the same as API jey users)tyk_apis
: API Definitions that can be pulled by Tyk and edited in the Dashboardtyk_organisations
: Organisations are a meta-user for API's, you wil notice in a file-based API definition that there is an OrgID field, this basically enables the segmenting of API's by organisations, these are relent if:
We use two backends because Tyk can work entirely without Mongo (you could purge the analytics to CSV if you like), and we needed something to manage the dashboard and Mongo had better features for that.
Mongo was chosen for the dashboard because of the built-in aggregation framework in v2.2+, this makes data aggregation into useful analytics data much easier ,as data pipelines can be built and handled in Mongo instead of being crunched by the dashboard. It makes things much cleaner in terms of code and flexibility and puts less stress on the application servers.
We decided from the start that Mongo is not a dependency to use the core gateway, only if you want a GUI and an easy way to get to view and filter analytics data, it's a value-add, not a requirement, Tyk can work entirely with only redis.
This means that all functional data is either stored in a file (Definitions) or in Redis (Keys). Also worth noting is that the Session and Authentication handlers use separate storage interfaces, so strictly speaking, you could swap out Redis for something else on an api-by-api basis if the correct interfaces are implemented. So loosely speaking, even Redis isn't even a requirement, so long as you can build a new Storage interface driver, which I imagine some forks may be doing.
As discussed in the other ticket, API Definitions are file-based because ultimately this is the simplest and most robust thing there is for an infrastructure service (see NginX, Redis and Apache all file-based configurations), the version data stored in Mongo are in fact almost exactly the same except for some additional metadata for the dashboard (an API Definition is a JSON document, which makes portability between Mongo and file really easy and painless), all the meta-data fields are completely ignored by Tyk as they are only so that the dashboard can do clever things like portable webhooks and event handlers.
Both projects (Tyk and the Dashboard) share the tykcommon package, this defines the APIDefinition object so that they are always compatible with one another, even if one is upgraded faster than the other.
Thanks again, the support for Tyk is truly awesome :-) . I think the diagram is very enlightening so maybe you want to put it into the official documentation.
So to summarize, the key data and the request journal are stored in Redis and everyting else is stored in Mongo.
In the light of my newly acquired knowlegde I understand what you were talking about in #23 . What about actually putting the metadata (except for aggregated analytics) into Redis? Then Redis would be the master that stores API definitions and organizations. If Redis is updated by one node, the nodes that are attached to it reload (let's say they poll every minute or when the manual reload is triggered). Then they don't need to know of each other. They just synchronize via Redis and it doesn't matter how many there are. We also don't get a new SPOF since the nodes are dependent on an available Redis instance anyways to access the sessions.
I may do that :-) Would probably clear up some things for our users...
As for putting everything in Redis, this is an option, however, as I mentioned API Definitions can actually set which session stores and auth handlers they want to use on a per-api basis, although Redis is the only supported store at the moment (there is a deprecated in-memory version too), it's a simple interface that could easily be transferred out to other k/v stores (or DB's for that matter).
It would mean that Redis becomes an overarching requirement for system management as opposed to a per-implementation dependency, which makes the current code quite flexible (I hope).
For example, if we decided to shift to BoltDB or Riak or Tokyo Tyrant instead of Redis, it would be very easy as we simply implement the Storage{} interface and register it in the right code hooks.
There's the Raft algorithm (goraft) which might be worthwhile looking at to sync up API configurations across a cluster, which makes having an API endpoint sensible again, as you could just query the cluster for the leader, then make a REST API call to push the new Definition, it would flush to file on the leader, which would then replicate to all the nodes organically.
This same functionality could then be used to push hot-reloads across a cluster without individual polling of the nodes and ensure that if things fail, there's always a leader to step up, you could even incorporate some functionality to auto-start more nodes using a webhook or some event handler (the infrastructure is already there for that).
On the other hand, sticking it all in redis really would be so amazingly simple, I'm of two minds on it - there's a lot of advantages to having the nodes sync up and elect leaders:
Putting the definition into Redis would mean the dashboard needs to speak to Mongo and redis and there would be two copies of the definition floating around, which could introduce drift.
I'm just thinking out loud here... sorry for the rant :-/
I will have a play around with Raft to see how much of a pain it would be to implement well, a POC would set things straight and make it a bit clearer, if there's no existing implementation that we can use, there's no way we're rolling our own, so the Redis option will be what we end up with.
I still thing that the distributed master election is a bit overengineered. AFAIK Redis satisfies all the requirements we have.
You could still define interfaces that encapsule the storage functionality. One interface called ApiStorage with a Redis and a file backend and one interface called MetricsStorage with a CSV and a MongoDB backend. What do you think?
Interesting, more thoughts (sorry, this is a lengthy one):
Let's turn this on it's head a little and start with the desired behaviour:
From an end-user perspective, setting up a single tyk node, or booting up twenty should be seamless, and require little to no configuration on my part. In a typical cloud environment it should be assumed that the node will fail, shut down or be arbitrarily rebooted for maintenance, this should not affect the performance of the application or the cluster and I should still be able to administer it remotely using the API.
Having Tyk nodes own their own configuration management in the above scenario creates the following problems:
A potential implementation:
tyk.node.id.*
, then sorts them by their value (a random integer between 1 and n)/tyk/master
requesttyk.node.id.{{IP}}.{{Port}}: randInt()
with a TTL of 20s and a floor of the lowest value in the list (so it appends itself), it it is the only node, the floor is 0.tyk.nodes.reload
key, if it receives a message, it triggers a hot reloadtyk.definitions.*
and loads them to memory/tyk/apis
on the current master will overwrite the key in redis/tyk/actions/reload
(or something similar) on the master will send a trigger message via the redis pub/sub channel, only the master may write to this channelSo, what does this system mean, worst case scenario:
What does this enable? Looking at the above list of things we want to achieve:
From an end-user perspective, setting up a single tyk node, or booting up twenty should be seamless, and require little to no configuration on my part. In a typical cloud environment it should be assumed that the node will fail, shut down or be arbitrarily rebooted for maintenance, this should not affect the performance of the application or the cluster and I should still be able to administer it remotely using the API. - :+1:
And from the benefits of a self-managing cluster:
And above all, we can now store API Definitions centrally in redis, trusting that they are managed by only one node, removing the requirement for MongoDB altogether (and the host-manager for that matter).
Thoughts?
Thanks for the long reply :-)
I still have some problems with the idea of a master election. For one, I could not just go to any node to add APIs, I would have to go to one node that tells me who is the master. Then I can query the master. This complicates things for clients. The problem is also that the Tyk master may not be reachable from the outside if there is a firewall/LB between the client and the Tyk master. The LB only allows us to send our query to an arbitrary node (which might not be the master), not to a specific one. I would be more happy if the Tyk node would just forward the query to the master (if you really want to go with the master/slave method).
With the solution I proposed you wouldn't have any downtime or addressing problems because every node has the same rights to write to Redis. The question is really whether API definitions are more special than keys and why we should treat them differently. Why should writing API defs be restricted to one node but not writing keys?
About your annotations to my proposal:
As you see, I'm not a fan of the whole "master" thing. To me it doesn't provide any benefits compared to synchronizing via Redis. Every Tyk node has to have the code to write to Redis since any one might become master. And careful synchronization is necessary anyways. So why make it more complicated that necessary. The only thing a dynamically elected master would bring is better ways of synchronizing but I think that's not worth all the trouble since it can also be achieved differently (as described).
I agree with you, I think this whole discussion has become a little academic, and it really doesn't need to be, it boils down to:
So the action from this, really, is to document the Dashboard API - which I really should have done a long time ago.
If we were to add an endpoint to update or add an API definition, it should just flush to disk, and the integrator would need to manually update all hosts (not too hard, it's a loop through all running nodes). Followed by an API call to all running nodes to reload.
That would be the simplest thing to do.
Then if an integrator would rather have a centrally managed service, they can use the (soon to be documented) dashboard API. Let me explain why the API that ships with the dashboard is better:
You get this hierarchy in the Dashboard API:
The web app is basically a REST client to the dashboard API. Since it's meant as a C&C API, it has much more functionality than the main Tyk one does.
so basically - this needs documentation, and potentially the API endpoint needs fleshing out to flush to disk.
:-D
The thing with making the dashboard the master is that people would have to license the dashboard to be able to administer Tyk. I like the idea of making the dashboard for-pay and the node for-free. Every sane person in an enterprise environment would of course license the dashboard instead of building it himself (at least with the current pricing). But to couple the dashboard GUI to the administration API and license them together is not an idea I like. Why not leave the administration API open-source and free and just license the actual dashboard? Of course this is a business decision rather than a technological decision :-)
It's an idea worth considering, to be honest it felt quite awful crippling software like this.
The dashboard is actually just a large API server and a separate webapp, so you never need to touch the dashboard, you could just run the application and use the API directly, the license is actually for the dashboard and the finer-grained management API.
Implementing an API endpoint that will flush a configuration to disk is a compromise I'm quite happy with, it's actually the only bit of functionality that is exclusively in the dashboard. So that's one element that we'll put on the roadmap and get out with the next release.
Regarding lifting restrictions on the dashboard API, we'll have a think, we're quite focussed on getting more people to use the software so it's a bit more... complicated.
:-)
Closing this ticket, we're going to go with Adding REST anf file based flush to Tyk for now, reverting this back to ticket #23 and tagging as appropriate.
This is basically a ticket to discuss ideas about the structuring of Tyk so we don't have to keep polluting #23 :-P
Thanks for the extensive reply. Could you be bothered to draw up a diagram similar to the one I drew which details the current communication paths (includeing the implicit ones through databases)? I think it would really help understanding the current architecture. I dumped the powerpoint slide with my diagram to http://s000.tinyupload.com/index.php?file_id=87456586263021545868
Two more questions: Is there a system behind where what information is stored? In Redis I see tyk-admin-api-XXX and apikey-XXX, those are both keys. In mongo I see tyk_analytics_users, tyk_apis and tyk_organisations. Am I correct assuming that metadata is stored in Mongo and keys in Redis? Why did you decide to use two database backends?