NREL / api-umbrella

Open source API management platform
MIT License
2.03k stars 325 forks source link

Installation documentation #104

Open brylie opened 9 years ago

brylie commented 9 years ago

As a server administrator I would like brief installation instructions So that I can prepare a server and install API Umbrella

Details

While we have great packaging and a Deploying from Git page, we do not have a specific page for Installing. An Installing page is somewhat conventional for project documentation, and is often included, for example, as INSTALL.txt.

Task

Lets put together a brief installation document for the Documentation section.

darylrobbins commented 9 years ago

I have started a Deployment page on the wiki, so we can build up the documentation in this area. I will start to state some of my assumptions/questions and then @GUI can validate them.

https://github.com/NREL/api-umbrella/wiki/Deployment

dmolina-ot commented 9 years ago

I have to questions about distributed deployment:

1) Can start router/gatekeper in diferent machines balanced? 2) In this situation, https://raw.githubusercontent.com/darylrobbins/api-umbrella/deployment/website/images/docs/deployment.png, in which node is running Redis? Is possible start Redis in different machine?

darylrobbins commented 9 years ago

Excellent questions.

1) Yes, you can and should be running multiple router nodes for a production deployment. That's why all the nodes appear stacked (one on top of another) to show that there should be multiple of each type. And as you say, then you'll want to place a load balancer in front of your router nodes. I forgot to add that to the diagram but I will update it.

2) There is a small Redis instance running on each router node. It is setup automatically when you install the package, along with all the other components needed in the router.

I'd recommend not trying to move Redis off of the router nodes. It is being used as a local server data store to improve performance/scalability by minimizing the amount of network traffic required for each API request. So, firstly, it may actually break the routers since they are not expecting to share a Redis instance with other routers. And, secondly, even if it did work, you would lose the performance and scalability benefits that it was providing.

For the router, Redis is supposed to be a machine-local cache/staging area on the path to communicating with MongoDB and ElasticSearch.

dmolina-ot commented 9 years ago

Then, if each router node has a Redis instance, the rate_limit is per router/node or is persisted in mongodb or elasticsearch?

darylrobbins commented 9 years ago

For short-term rate limits (less than 5 - 10 seconds), they are per router/node. Things are happening too fast in this case for all the nodes to keep in-sync. For longer-term ones, they are global to all nodes and persisted in MongoDB. The Redis instance on each node is being used to minimize the communication to MongoDB for enforcing rate limits. There's a background job on each router node that pushes its data from Redis to MongoDB.

The Redis instances should really be thought of as temporary storage, where anything important is being periodically persisted to MongoDB.

dmolina-ot commented 9 years ago

Ok! Thank you.

I find this things very interesting, so I think they must be added to wiki or doc, better than left them like issue.

darylrobbins commented 9 years ago

Yes, I will keep working away at the wiki. I have already updated the diagram. Note that I am not the developer of this project. I am just trying to fill in these important details as I discover them from digging into the project.

From a deployment perspective, it's easiest to think of the router as a single black box. It has a bunch of internal components but they can't be split out. I think the new architecture will definitely simplify things quite a bit.

GUI commented 9 years ago

Sorry for not chiming in earlier, but thanks so much @darylrobbins for filling in details here and starting the documentation on this. Everything Daryl has said and documented is correct--in general, as things stand, the router (which includes the gatekeeper) is really intended to run as a single entity, so you don't want to split those out. The main thing to consider splitting out onto separate servers are the MongoDB instances, ElasticSearch instances, and the web component. But you can mix and match those as appropriate, depending on your needs (for reference, we run the router and web components together on the same servers, and then on our db servers house both mongodb and elasticsearch together).

Our default installation package installation runs all these components on a single server, which is the most straight forward way to get started. This setup could also work if you don't have super high load or need redundancy, however for a production setup, it's probably a good idea to consider multiple router servers and separating out the databases.

Daryl's done an excellent job of sleuthing all this out (and my apologies this wasn't better documented to begin with). But I can also try to flush this out in the coming week what our production setup looks like and how we set that up. Essentially, though, it boils down to the pieces Daryl has documented: Configuring the services piece of config in the api-umbrella.yml file (this controls which services get started on each machine) and then adjusting the mongodb.url and elasticsearch.hosts config to point to the appropriate servers. There may also be some manual steps the first time you setup a MongoDB replicaset you need to follow, but again, I'll try to help flush this out.

Thanks all!

dmolina-ot commented 9 years ago

darylrobbins, can you send me de original of the diagram to make some change, https://raw.githubusercontent.com/darylrobbins/api-umbrella/deployment/website/images/docs/deployment.png ?

darylrobbins commented 9 years ago

Sure: https://github.com/darylrobbins/api-umbrella/raw/gh-pages-cname/artifacts/deployment.vsdx

dmolina-ot commented 9 years ago

Thanks!