SNAS / openbmp

OpenBMP Server Collector
www.openbmp.org
Eclipse Public License 1.0
232 stars 76 forks source link

Partition management / general scaling #36

Open DamianZaremba opened 7 years ago

DamianZaremba commented 7 years ago

Firstly, thanks for making this project available; it's highly valuable when dealing with multi-homed networks.

The current MySQL DB schema partitions certain tables, however the partitions included are in the past, so in reality all data ends up in the last partition.

I wrote a script you handle adding and dropping partitions, which I could potentially contribute.

I'm opening this for 3 reasons;

The later question is slightly loaded, so I'll expand; Given a network with slightly 'noisy table' (>700 peers, 12 IX (x2 route servers), numerous transits etc), the write load for MySQL is quite high.... this can just about be dealt with using flash based storage, but even for short term (3 day) storage it's hard keep performance (spikes of ~20k qps) and nearly impossible to keep redundant outside of dual writing.

In our case, the clear option for long term analysis is to use map/reduce technologies, real time to use streaming technologies, but this leaves a gap.... which is having a nice UI for "recent" data querying that can reliably used in an ad hoc manner.

So my question really is.... is there a known way to scale this, is this tooling something others are interested in and do you have any thoughts on the way forward?

A simple initial step could be making the mysql consumer more pluggable to handle writing into HDFS or Cassandra for example.

A few 'use cases' would be really helpful if people are willing to contribute :)

TimEvens commented 7 years ago

@DamianZaremba, any contribution would be great! So for the first two questions, it would be great to get PR's under openbmp-mysql-consumer repo for those.

Regarding sticking with MySQL. No, we are moving away from it in the near future, but we'll maintain MySQL for a while for sure.

Issue #27 goes over the performance and scaling numbers. Sharding is how you would scale MySQL via topic mappings (router/peer groups), which is not automatic like auto-sharding/mapping that is integrated into the client/server. But, even with MySQL sharding or auto-sharding implementations such as with Cassandra, Mongo, ... the number of instances don't really get better due the the massive amounts of UPSERTS. You still need many servers if you plan to manage hundreds of millions of NLRI's, such as with transit peer or RR monitoring. The other issue is that 99% of the use-cases with BGP data is to do global scan sub-second lookups over all data, which does not fair well in clustered DB deployments or map reduce.

Although, performance/scale isn't the only challenge that needs to be solved by a better storage system. While Kafka has a ton of benefits, it also has its downsides. One of the major downsides of Kafka is the log retention. It's not advisable for anyone to store BGP data that is highly duplicated for more than 24 hours in Kafka because if a consumer were to connect to Kafka and start from the beginning, it could take a very long time for that consumer to catch up. What this means is that the new consumer is not able to get the state of things. For example, peers and routers should be pretty stable and therefore the ROUTER INIT, PEER UP, and RIB tables will have already expired in Kafka before the new consumer connects. Therefore, we need to have a manager that runs all the time, state maintains all data, and then facilities new consumers so that a new consumer can get inventory and RIB snapshots as well as information on where the consumer should start consuming from live feeds.

Our solution to address to MySQL performance and consumer management is BMP-Manager, which scales horizontally, auto-shards by partitions, uses less than 1GB of RAM regardless of the number of peers or NLRI's and does not demand high disk performance like MySQL.

Each BMP-Manager instance handles:

We haven't put bmp-manager on github yet, but we will very soon.

BTW, I sent you an invite to Join CiscoSpark. If you are able to join there, we can continue a more interactive convo.