This Bosh release is meant to provide a fully featured Cloud fiendly Memcached compatible Service to a Cloud Foundry deployment. This service was designed with ease of operational management, high scalability, and fault tolerance as its primary goals.
Features:
The simplest way to deploy this memcache service is to use BOSH. The following is an example of the properties section of a deployment manifest.
instance_groups:
-name: memcache_broker
jobs:
- name: route_registrar
release: cf
- name: memcache_broker
release: memcache
properties:
route_registrar:
routes:
- name: memcache_broker
port: 21080
registration_interval: 20s
uris:
- memcache-service.cf-deployment.com
memcache_broker:
broker_password: brokerpasswordforcc
host:
port: 21080
memcache:
vip: 10.100.100.100:11211 # Optional ViP externally configured for simplified access to cluster
servers: # List of all memcache-hazelcast instances in the cluster for client credentials
- 10.100.100.101:11211
- 10.100.100.102:11211
plans:
small:
name: small
description: A small cache with redundency
free: true
medium:
name: medium
description: A medium cache with redundency
free: false
memcache_hazelcast:
host:
srv_api: https://memcache-hazelcast.cf-deployment.com
password: password
memcache:
secret_key: secret
-name: memcache
update:
max_in_flight: 1
jobs:
- name: metron_agent
release: cf
- name: route_registrar
release: cf
- name: memcache_hazelcast
release: memcache
properties:
route_registrar:
routes:
- name: memcache_hazelcast
port: 8080
registration_interval: 20s
uris:
- memcache-hazelcast.cf-deployment.com
memcache_hazelcast:
heap_size: 512M # The Xmx value of the node's jvm
host:
password: password # srv api password
port: 8080
srv_api: https://memcache-hazelcast.cf-deployment.com # Http endpoint for broker to interact with cluster (delete a cache, etc.)
memcache:
secret_key: secret # Secret Key broker and memcache-hazelcast share for password hashing
test_password: password
test_cache: test
hazelcast:
max_cache_size: 268435456 # Max heap to reserve for cache data before forced eviction take place to protect against OOMErrors
machines: #List all members of the cluster and their respective zones for backup distribution
zone1:
- 10.100.100.101
zone2:
- 10.100.100.102
plans:
test:
backup: 0
async_backup: 0
eviction_policy: LRU
max_idle_seconds: 60
max_size_used_heap: 1
small: # must match the plans.name in broker to associate the 2.
backup: 0 # How many backups of entries to distribute accross the cluster
async_backup: 1 # Same as above but done asyncronously
eviction_policy: LRU #When plan reaches max quota how do we evict?
max_idle_seconds: 86400 #Do we put a max ttl on data to help improve Over Commit abilities?
max_size_used_heap: 100 #How much data including backups will this node hold. thisvalue*nodes=total quota
medium: # must match the plans.name in broker to associate the 2.
backup: 0 # How many backups of entries to distribute accross the cluster
async_backup: 1 # Same as above but done asyncronously
eviction_policy: LRU #When plan reaches max quota how do we evict?
max_idle_seconds: 86400 #Do we put a max ttl on data to help improve Over Commit abilities?
max_size_used_heap: 200 #How much data including backups will this node hold. thisvalue*nodes=total
For more details on all of the config options available you can review the spec file for each of the jobs.
Once deployed simply register the service broker to the Cloud Controller and you should be good to go.
cf create-service-broker memcache servicebroker brokerpassword https://memcache-service.cf-deployment.com
cf enable-service-access memcache
Be sure to configure your bosh manifest to only deploy one node at a time giving hazelcast time to recover and repartition after each node is reset.
update:
canaries: 1
max_in_flight: 1 # This is the important setting
The latest version of this service has been tested with ubuntu-trusty version 3232.12. It will probably work fine for all newer version too. The recent upgrade to Kernel 4.4 in ubuntu might be the only significant thing with that stemcell.
For a caching service it is important to configure memory correctly. There are 3 places memory is configured that you should take note of:
This value is passed directly to the JVM and represents the total heap this node should use. -Xmx and -Xms are set to the same value since memory consumption and control of that memory consumption is the point of a cache node.
This value should be close to the total amount of RAM provided on the VM the job is deployed to. Perhaps subtract 128M-384M or so for OS overhead and misc Java native memory consumption.
This value represents the total amount of heap memcache-hazelcast should reserve for cache storage. If the total memory cost of all the cache entries stored on this server exceed this value then the node will start evicting a percentage of owned entries on this node from each of the caches on this node. This process doesn't care if the cache is large or small it just trims X% of the LRU entries from all of the caches to help relieve memory pressures on the node.
In a perfect world a node's memory pressures should never reach this point. Operators should monitor cache size and increase the cache memory pool when cached memory reaches a certain threshold.
This value should obviously be less than heap_size. The difference between max_cache_size and heap_size is the memory reserved for the node to actually operate, handle requests, and allow hazelcast to manage the cluster. At a minimum this difference should probably be at least 256M. The max will depend upon the amount of concurrent load you expect to get on each of the nodes. We configure ours with a difference of 512M.
This value is set in each of the plans you offer and represent the cache data in MB you wish this plan to allow on each of the nodes. This value multiplied by the number of nodes deployed is the quota for this plan. If I deploy 10 nodes and each have a max_size_used_heap of 100 then each service instance of this plan will have a quota of 1000MB.
This value includes backup data use. So, if I have a backup of 1 you can cut the quota in half and that will represent the amount unique data this plan will hold. A value of 100MB would represent roughly 50MB of unique cache data this node will hold for each service instance.
You can obviously play with these values and the naming of the plans all you want by adding more resiliancy vs. cache size.
When the service is bound credentials similar to these will be provided to the app.
{
"VCAP_SERVICES": {
"memcache": [
{
"credentials": {
"password": "securehashofusernameandsecretkey",
"servers": [
"10.10.100.40:11211",
"10.10.100.41:11211",
"10.10.100.42:11211",
"10.10.100.43:11211",
"10.10.100.44:11211",
"10.10.101.40:11211",
"10.10.101.41:11211",
"10.10.101.42:11211",
"10.10.101.43:11211",
"10.10.101.44:11211"
],
"username": "large|{service_instance_guid}|{app_guid}",
"vip": "memcache-hazelcast.cf-deployment.com:11211"
},
"label": "memcache",
"name": "large-memcache",
"plan": "large",
"tags": []
}
]
}
It is recommended that clients connect to all servers in the cluster and use a consistent hash to pseudo load balance between the nodes even though any key could actually be obtained from any node. The vip (if configured) should only be used for cases where client config simplisity is desired over performance.
Currently memcache-hazelcast and memcache-broker produces useful metrics sent over the firehose. Support for metrics over varz/healthz was recently removed.
When tasked with providing a caching solution for our organization we were given several requirments:
When you think of caching solutions with broad client support the 2 products that immediatly come to mind are Redis and Memcached.
The first solutions we investigated were Redis based. However, open source Redis out of the box didn't appear to have any good multi-tenancy support. Clustering for Redis 2 was also based on complex Master/Slave config and coordination which violated our easy to operate constraints. Redis 3 was recently released that is supposed to provide better clustering support but we completed our evaluation prior to Redis 3 release. Redis also brings some nice to have features like persistence and pub/sub. But, these were not significant requirments for us.
We also investigated raw Memcached. The main Memcached implementation is also not clusterable or multi-tenant. With Memcached most clients support using a consistent hash to spread data across many memcached instances causing limited loss of data in the case of an outage but would not fulfil the zero down time requirment for scheduled maintanence. Multi-tenancy requirements would require us to either write a proxy or manage independent memcached process(s) for each customer.
There are thirdparty vendors that provide multi-tenancy and cluster management capabilities on top of Redis and Memcached but these solutions were typically too expensive for us to consider at this point.
During the process of evaluating Memcached we discovered that there are many distributed caching solutions on the market that provided Memcached server compatibilities. Many of these solutions were Java based and were Free and/or Open Source. None of these clusterable thirdparty memcache solutions provided a multi-tenant memcache implementation but they otherwise fulfilled most all of our given requirements and all of them were emeddable. So, it was decided we would select a Java memory grid solution and embed with a multi-tenant memcache compabible front end that we would write.
We evaluated Infinispan, Hazelcast, Gemfire, and GridGain. Of the solutions evaluated Hazelcast is the solution we selected for the following reasons:
With the announcement of Gemfire being open sourced it quickly jumped to the top of our list. We had a nice to have desires for persistence/overflow to disk and an obvious affinity for Pivotal projects. However it was ultimatly not selected because:
If Project Geode were to add more clear in-memory quotas and/or open source multi-datacenter support we would seriously consider migrating to it.
Netty recently added a Memcache Codec to its 4.1.0 release. Consequently use of Netty for a highly scalable and efficient front end was a no brainer and has worked out quite well.
Here are a few notes of the basic architecture of the memcache-broker, memcache-hazelcast, and how they work together.
This is the module that acts as the memcache front-end and a Hazelcast node member. All memcache requests come to a memcache-hazelcast node and is asynchronously executed on the node actually owning the key being operated on using an IExecutorService. This allows a single memcache connection to process many pipelined requests at once. The responses to these requests are then sorted and returned to the client in the order requested.
Using hazelcast synchronously was actually slightly quicker from a single request latency standpoint. However, the asyncronous solution was significantly much more scalable and the difference was neglible when including network latencies.
By executing the request on the node that owns the key it ensures that every single memcache request will only include at most one network hop for actual processing of the request. This is important since some memcache requests result in several Hazelcast operations. For example, the an increment consists of 4 commands lock, get, set, and unlock. By running on the node that owns the key no blocking for network communication occurs for any of these operations.
This module provides a Cloud Foundry v2 Service Broker. Allowing users to create, bind, unbind, delete cache tenants for Cloud Foundry deployed apps. The broker is stateless and therefore clusterable.
Authentication is implemented as a hash using a shared key between the memcache-broker and memcache-hazelcast nodes. This solution was chosen in an attempt to eliminate the need to deploy and manage a persistent credentials store of some sort. The main limitation with this solution is to rotate a cache password the user would to create and bind to a new Service Instance this would essentially invalidate the old password and give them a new password for a new cache.
We have not performed extensive performance tests on this solution though we have done some scalability testing. The tests we have run shows that memcache-hazelcast uses quite a bit more CPU than raw Memcached.
We have profiled memcache-hazelcast and believe the solution is quite efficient as a front end to hazelcast.
As far as latency goes under a single thread localhost to localhost not clustered test the mean request time for memcache-hazelcast appeared to be about 2-3 times worse than memcached (350000 vs 150000) nanoseconds. When placed under siginifcant load (1000+ threads) mean times between memcache-hazelcast and raw memcached actually began to match.
Localhost tests with Near-Cache enabled actually matched raw localhost Memcached speeds.
During real network testing with a 10 node cluster request times average between 1 and 2ms with a min of about .4 and a max of 2-3ms regardless of the command executed. With near cache enabled GET requests in a read heavy cache had a min of .15 ms with a mean of about .5-.6 ms. Fluctuations are much more network related at these speeds.
Use of Near-Cache has the caveat of less precise expiration times and the potential for dirty reads. Cache hits of a near cache item doesn't update hazelcast idle timer. This caveat can be mitigated by using a relatively small max TTL. So that you know max idle will be updated at least as often as max TTL.
We haven't run any real network tests comparing memcache-hazelcast with memcached.
Though I don't have much development experience with either Memcached or Redis, I have no doubt that Redis is superior to Memcached in many ways. However, raw Redis just didn't meet what we were looking to provide regarding multi-tenancy and Redis 2 lacked clustering. We looked into potentially fronting Hazelcast with a Redis like protocol instead of Memcache but the protocol and command set appears far more rich and complex than Memcache's protocol making fronting Hazelcast with a Redis protocol not feesible.
In addition to the abundant client support for the Memcache protocol we liked the idea of giving our customers a semi stable and simple Memcache API to depend upon instead of something custom or more nitch. This gives our customers who depend upon the Memcache Protocol more API stability and allows us to potentially change solution details in the future without asking all our customers to rewrite their cache logic.
Using the Memcache protocol also allows our customers to potentially utilize a thirdparty Clustered Memcache SaaS solution for amazon vs. on prem deployments without changing their application.
No, we're quite proud of this project. But we recognize that our organization's requirements for a cache solution may change. We also know that our solution may not be the fastest or best available. So, we plan to use this solution for as long as it meets our needs and hope to minimize impact to users of this service if things change in the future.
It is true, Java's abstraction over how it manages memory makes it hard to implement something like a multi tenant memory quota based cache. Under periods of heavy load an over-committed cache cluster could cause a node to run out of memory and fail. This could potentially bringing down the entire cluster as backups are re-replicated and new owners for the data is established.
This potential issue has been on the forfront of our minds from the beginning and believe our solution is fairly safe. First of all Hazelcast provides a great foundation for memory based quotas in a multi tenant cache. Hazelcast eviction config is on a per node basis not for the entire cluster. So, if a node goes down each cache on the other nodes won't attempt to allow more memory for a cache than that node's share of the quota allows. This protects the whole cluster as backups are re-replicated and new owners are identified if a single node's cache quota is exceeded Hazelcast will start evicting.
The Hazelcast config above should work perfectly if you are not over committing your cluster. However, we are big fans of over committing our resources for greater pooled resource efficiency in our system. So, we implemented a system allowing the deployer to reserve a maximum amount of memory for cache data. If suddenly a cache node's cache data exceedes this reserve maximum a scheduled job will LRU evict X percent of the owned entries on each of the caches on that node relieving memory pressures. This job will also log angry messages warning you that you need to add more nodes or memory to your cluster ASAP to further prevent customer data from being unexpectedly evicted. However, in a perfect world a good operator of an overcommitted cluster would pay attention to the metrics being emitted from the nodes and add more capacity prior to this last resort safety measure having to kick in.
We believe it to be very compatible. We have an integration that continuously runs memcapable against our server. We have other integration tests to regress functionality not covered by memcapable like GAT and touch.
This service supports CAS. Something that seems many non-memcached implementations seem to leave out for some reason.
The implementation has been tested with several clients notable libmemcached, spymemcached, and xmemcached. All seem to work well.
That said we're pretty new to Memcache and we may have completely misread the spec in relation to certain functions. If you find we messed something up please file an issue.