Brewskey / spark-server

An API compatible open source server for interacting with devices speaking the spark-protocol
https://www.particle.io/
GNU Affero General Public License v3.0
54 stars 27 forks source link

Express Clustering #137

Open jlkalberer opened 7 years ago

jlkalberer commented 7 years ago

Use clustering to improve speed. We will want to do this at the HTTP server and allow it through settings on the COAP server (spark-protocol). We'll need a singleton way of passing events so you'll need to update EventPublisher (or whatever it is) to send data between the forks.

This should be really easy to implement on spark-server since we're stateless - https://github.com/Flipboard/express-cluster

straccio commented 7 years ago

Why not using ZeroMQ? ZeroMQ can add the ability to have an interprocess communication through pipe or network. That can help where there's a need to do a fleet of machines in clusters not only some processes.

jlkalberer commented 7 years ago

That is definitely something we can look into for going server-to-server. I was just planning on using Redis but if you have some good reasons why we should use that instead I'd be open to it.

For now we are trying to keep things simple so anyone using spark-server only has to install spark-server to run it. I don't want them to have to spin up a MongoDB and ZeroMQ if they don't really need the extra services.

straccio commented 7 years ago

Usually i use redis to share sessions cross web servers in php and store sensors data for time series. ZeroMQ is a communication framework. In my vision some processes outside the device side (coap) can speak each others through ZeroMQ. Some processes cpu intensive can be written in c/c++ for realtime tasks

jlkalberer commented 7 years ago

Alright, I did a bit of digging ZeroMQ it is. Thanks for the suggestion.

In order to implement this we need to make sure that the spark-server will work with or without a ZeroMQ service running in the background.

haeferer commented 7 years ago

In our solution we use RabbitMQ as a Messaging Server, especially to provide a "Bumper" between our different Components. The Spark-API es a good example. You can use the API Perfect for commands, but it's problematic for events, cause if you update your Component you will lose Events as long as your connection to the SparkServer is not reestablished. One Good (first) Scenario would be a configuration where all Events go directly into a (one or more) configurable messageQueues (in addition to the default way). In Docker you manage this over ENVIROMENT Variables, so you can activate such an interface by detecting this enviroments.

I can provide a sample and documentation for this. I could also implement such a solution but i need a little help ;) to know at which place to implement such an extension.

I Think there is no need to replace the complete API with a message queue.

I Prefer RabbitMQ cause administration and production management is very good, and there are also very good Images to directly build as solution using Docker. https://hub.docker.com/_/rabbitmq/

To store data between Instances REDIS is a good and stable solution (i already implement such solutions).

jlkalberer commented 6 years ago

Did anyone ever get things working with a message bus? I can’t remember.

haeferer commented 6 years ago

Yep. We are using our fork, also for production.

https://github.com/keatec/spark-rabbit

jlkalberer commented 6 years ago

Awesome, I had someone ask about multi-server scaling.

How many devices are you running?

haeferer commented 6 years ago

We are currently testing in our own enviroment (40-50 Devices). Real Live is planned starting next year 10.000 -> 100.000). We are running different spark_servers (managing 10-20 Devices in our demo) and a single RabbitMQ. Our libary also includes (in events) informations about the messagequeue to communicate with the device producing the event, and a directory service (not part of the fork) to store informations about the sparkserver for a device (REDIS). Devices are connecting using a LoadBalancer which is configured to send a sourceIP and Port always to the same spark server (so if a particle reconnects to a different spark-server, the informations are updated in the directory, and processes answering to events can use the ActionQueue directly from the incomming event)

straccio commented 6 years ago

Me no, i'm busy on the customization of the firmware.

jlkalberer commented 6 years ago

@haeferer - check out https://github.com/Brewskey/particle-collider if you'd like to test the scaling.

So you always reconnect the device to the same server? Aren't you worried about a case where you need to take a server down for maintenance?

haeferer commented 6 years ago

The loadbalancer will handle this Szenario. Not perfect, but works. One Sparkserver goes down, Particle is send to next server and fails (cause of wrong handshake ) then reconnects ... fine

DaliborFarny commented 6 years ago

@jlkalberer Just a shy question - is clustering on a roadmap? We run the server with 400 devices (roughly half online) and sometimes it eats up all server CPU, slowing down the response. Possibility to spread the load over more processors/cores would be great.. Just asking, I know you are busy..

jlkalberer commented 6 years ago

No, we aren't looking to add it. I tried to get it working a while back and ran into issues with the way we're dispatching events. We didn't change this much from the original spark-server so there are problems :/

I don't think you should be seeing issues with only 400 devices. There are other people using a single server with close to 1000.

Are you calling a lot of functions/variables/webhooks?

DaliborFarny commented 6 years ago

Thank you for the answer. There are not many calls, but they often come in batch of 10+ (when loading data from device). No webhooks, no variables. I will work it out. Thanks again.

— Dalibor Farny

--- original message --- On Mon, May 21, 2018 at 11:36 pm, notifications@github.com John Kalberer wrote:

No, we aren't looking to add it. I tried to get it working a while back and ran into issues with the way we're dispatching events. We didn't change this much from the original spark-server so there are problems :/

I don't think you should be seeing issues with only 400 devices. There are other people using a single server with close to 1000.

Are you calling a lot of functions/variables/webhooks?

You are receiving this because you commented.

Reply to this email directly, view it on GitHub, or mute the thread. --- end of original message ---