Proposal: A scalable asynchronous Analytics platform

jchauncey commented 8 years ago

Current Architecture

                          ┌────────┐                            
                          │ Router │                            
                          └────────┘                            
                               │                    ┌──────┐    
                           Log File         ┌──────▶│Logger│    
                               │            │       └──────┘    
                               ▼            │                   
┌────────┐                ┌─────────┐       │                   
│App Logs│───Log file────▶│ fluentd │──UDP/Syslog               
└────────┘                └─────────┘       │       ┌──────────┐
                                            │       │ stdout   │
                                            └──────▶│  metrics │
┌─────────────┐                                     └──────────┘
│ HOST        │          ┌───────────┐          Wire      │     
│  Telegraf   │────┬────▶│ InfluxDB  │◀───────Protocol────┘     
└─────────────┘    │     └───────────┘                          
                   │           │                                
┌─────────────┐    │           │                                
│ HOST        │    │           ▼                                
│  Telegraf   │────┤     ┌──────────┐                           
└─────────────┘    │     │ Grafana  │                           
                   │     └──────────┘                           
┌─────────────┐    │                                            
│ HOST        │    │                                            
│  Telegraf   │────┘                                            
└─────────────┘

Problem 1: Point to point connections

Right now we have point to point connections from N number of fluentd daemons to logger and stdout-metrics. For every value that is received by fluentd we immediately send that value to both of those components (even if it is not for that component).

Problem 2: Communication happens synchronously

Data is written over UDP 1 packet at a time to both logger and stdout-metrics.

Problem 3: Duplicate UDP packets

https://github.com/kubernetes/kubernetes/issues/25793

Problem 4: write speed of fluentd -> stdout-metrics

Right now we see a bottleneck of how fast we can send data to stdout-metrics and we cap out at like 80 requests per second on the cluster.

Proposed Solution

I would like to propose moving to an asynchronous system for delivery both log messages and metric data in the cluster. The architecture would look something like this:

                        ┌────────┐                            
                        │ Router │                  ┌────────┐
                        └────────┘                  │ Logger │
                            │                       └────────┘
                        Log file                        │    
                            │                           │    
                            ▼                           ▼    
┌────────┐             ┌─────────┐    logs/metrics   ┌─────┐ 
│App Logs│──Log File──▶│ fluentd │───────topics─────▶│ NSQ │ 
└────────┘             └─────────┘                   └─────┘ 
                                                        │    
                                                        │    
┌─────────────┐                                         │    
│ HOST        │                                         ▼    
│  Telegraf   │───┐                                 ┌────────┐
└─────────────┘   │                                 │Telegraf│
                  │                                 └────────┘
┌─────────────┐   │                                     │    
│ HOST        │   │    ┌───────────┐                    │    
│  Telegraf   │───┼───▶│ InfluxDB  │◀────Wire ──────────┘    
└─────────────┘   │    └───────────┘   Protocol       
                  │          ▲                        
┌─────────────┐   │          │                        
│ HOST        │   │          ▼                        
│  Telegraf   │───┘    ┌──────────┐                   
└─────────────┘        │ Grafana  │                   
                       └──────────┘

By pushing data from fluentd directly to NSQ we allow the consumers of data to pull off the queue as fast or as slow as they desire. NSQ is written to be a fault tolerant high throughput queue that can scale as you need it. In this architecture, however, we only have 1 NSQ instance for simplicity.

Metric data published to NSQ is read via the telegraf nsq-consumer plugin.

All data pushed onto NSQ is written to a topic (logs/metrics) in this case.

As write speed is concerned here is a picture of me making ~800 requests per second from my laptop into the cluster.

screen shot 2016-06-07 at 4 16 41 pm

Eventually we could scale the single nsq instance and make it more fault tolerant and use persistent data but that isn't a big requirement right now. But having this messaging platform will allow us to expand async communication to other components which will allow us to scale out a cluster without fear of creating bottlenecks.

Example future use case: Someone wants to scale an app to 500 pods, we fire off a message to a worker that does that for us and puts a message back on the queue when its complete. The controller reads that message and can update the user (through websockets or ui or whatever)

To see the code for this implementation visit the follow repos:

monitor - Mostly just graph changes
logger - Updates how we receive log data to use nsq instead of syslog.
metrics-consumer - Pulls data from the metrics topic and sends it to influx. Batches in 1000 metrics or 5 seconds whatever comes first (configurable)
fluentd - Builds a new deis-output plugin that is responsible for sending data directly to nsq. It filters out all data that it does not care about.
nsq - This is a simple docker image for running nsq on kubernetes

arschles commented 8 years ago

Until you get the NSQ plugin into Telegraf, why not put the InfluxDB publishing code into the logger?

jchauncey commented 8 years ago

Separation of concerns mainly. Didn't want to worry about adding the influx code into logger. On Jun 7, 2016 5:23 PM, "Aaron Schlesinger" notifications@github.com wrote:

Until you get the NSQ plugin into Telegraf, why not put the InfluxDB publishing code into the logger?

— You are receiving this because you were assigned. Reply to this email directly, view it on GitHub https://github.com/deis/monitor/issues/104#issuecomment-224443838, or mute the thread https://github.com/notifications/unsubscribe/AAaRGBg8KGrL_8KkhV-ltxYi8AJQ6O4Nks5qJf13gaJpZM4IwdAP .

krancour commented 8 years ago

Separation of concerns mainly. Didn't want to worry about adding the influx code into logger.

Good point, but it can be done relatively cleanly if the Drain interface that used to be in there were resurrected.

Also, another thing to consider... I'm not too familiar with NSQ, but can it implement topics as well? If you would have both the logger and the "metrics consumer" pulling messages from there, you need a a topic; not a queue.

jchauncey commented 8 years ago

Yeah it has topics. I'll update the proposal to show that. On Jun 7, 2016 5:28 PM, "Kent Rancourt" notifications@github.com wrote:

Separation of concerns mainly. Didn't want to worry about adding the influx code into logger.

Good point, but it can be done relatively cleanly if the Drain interface that used to be in there were resurrected.

Also, another thing to consider... I'm not too familiar with NSQ, but can it implement topics as well? If you would have both the logger and the "metrics consumer" pulling messages from there, you need a a topic; not a queue.

— You are receiving this because you were assigned. Reply to this email directly, view it on GitHub https://github.com/deis/monitor/issues/104#issuecomment-224444633, or mute the thread https://github.com/notifications/unsubscribe/AAaRGNh96dzydlDkTw8GyF3oMPH3nQUbks5qJf6igaJpZM4IwdAP .

jchauncey commented 8 years ago

Updated with repos that have the code for implementing this.

arschles commented 8 years ago

@jchauncey if it makes things simpler, you might consider running a container alongside fluentd (in the daemonset) that fluentd can send, via syslog, on the loopback interface. This way, you don't have to change any fluentd plugins and you can control all the enqueue and dequeue code yourself. I don't believe the extra container or its functionality is technically necessary, just that it could add flexibility.

jchauncey commented 8 years ago

Well the plugin is great because we really only care about a very small portion of the overall data that fluentd is collecting. So the deis-output plugin allows us to filter out only the data we care about and also provides a nice interface for sending data to both nsq topics. In either case I will need that fork and it just made more sense to have it in fluentd rather than another app that we have to build and manage.

edit: plus i got to write some ruby =p

jchauncey commented 8 years ago

So I have a working telegraf plugin for fetching data directly from nsq. That means we could eliminate metrics-consumer from the diagram

jchauncey commented 8 years ago

Oh someone asked why the dip in the graph. THat was me restarting my test with a change that @gerred helped me with.

jchauncey commented 8 years ago

We are moving forward with this propsal so I am closing it.

deis / monitor