alexnathanson / solar-protocol

A repository in development for a solar powered network of servers that host a distributed web platform. Project by Tega Brain, Alex Nathanson and Benedetta Piantella. Supported by Eyebeam, Mozilla, and CS&S
http://solarprotocol.net
219 stars 19 forks source link

optimization: use a gossip protocol to disseminate network membership #5

Open simao opened 3 years ago

simao commented 3 years ago

Hi,

I really like the idea of the solar protocol. I'd love to help with my own server, but unfortunately I don't have a sunny place to leave my rpis. Maybe in the future.

I was looking into your code. If I understand this correctly, each node will contact all other nodes to decide if they should be the node updating the DNS records and serving http request. If you plan to scale the network to a somewhat bigger number nodes, this is not very scalable and will cause some problems. Each node has to know about all other nodes on startup, and periodically contact all nodes and verify which nodes are alive to determine if the DNS should be updated.

This could be solved by using a gossip protocol so that each node keeps an up to date version of which nodes are still in the network and what is the voltage level for each node. This has the following advantages:

  1. On startup, each node just needs to know about a small number other node on the network (just one if the network is small). Once the node connects, it will take part in the gossip network, and will get and up to date list of all the alive nodes on the network, including the voltage level of each node.

  2. Each node does not have to contact all other nodes on the network, nodes will cooperate with eachother and propagate up to date information to all nodes.

  3. Fault tolerance is handled by the gossip protocol. If a node leaves/is down, the network will gossip that information efficiently to other nodes.

  4. Node status, including voltage, converges fast without the nodes to contact each other. Instead of having NxN node connections, each node only talks to a small number of nodes and information is propagated that way.

I built a small proof of concept of this idea, see solar-gossip. It uses memberlist which implements SWIM with some more extensions. It is just a proof of concept, but it shows how it could work. Some edge cases are not handled, and it would need much better testing, but it shows the idea.

If you think any of these advantages are worthwhile and would like to try something like this, let me know and maybe we could work together to make it happen.

alexnathanson commented 3 years ago

Hi!

Thanks for your interest in the project and for your suggestions! I'll take a look at the proof of concept later this week. If you're interested, it would be great to chat about this at some point. We've actually been discussing moving to a system like this where we modulate the frequency of communication based on stored energy. Do you happen to know how memberlist and SWIM handle removing nodes from a network?

Best, Alex

On Sun, Mar 7, 2021 at 8:31 AM Simão Mata @.***> wrote:

Hi,

I really like the idea of the solar protocol. I'd love to help with my own server, but unfortunately I don't have a sunny place to leave my rpis. Maybe in the future.

I was looking into your code. If I understand this correctly, each node will contact all other nodes to decide if they should be the node updating the DNS records and serving http request. If you plan to scale the network to a somewhat bigger number nodes, this is not very scalable and will cause some problems. Each node has to know about all other nodes on startup, and periodically contact all nodes and verify which nodes are alive to determine if the DNS should be updated.

This could be solved by using a gossip protocol https://en.wikipedia.org/wiki/Gossip_protocol so that each node keeps an up to date version of which nodes are still in the network and what is the voltage level for each node. This has the following advantages:

1.

On startup, each node just needs to know about a small number other node on the network (just one if the network is small). Once the node connects, it will take part in the gossip network, and will get and up to date list of all the alive nodes on the network, including the voltage level of each node. 2.

Each node does not have to contact all other nodes on the network, nodes will cooperate with eachother and propagate up to date information to all nodes. 3.

Fault tolerance is handled by the gossip protocol. If a node leaves/is down, the network will gossip that information efficiently to other nodes. 4.

Node status, including voltage, converges fast without the nodes to contact each other. Instead of having NxN node connections, each node only talks to a small number of nodes and information is propagated that way.

I built a small proof of concept of this idea, see solar-gossip https://github.com/simao/solar-gossip. It uses memberlist https://github.com/hashicorp/memberlist/ which implements SWIM https://research.cs.cornell.edu/projects/Quicksilver/public_pdfs/SWIM.pdf with some more extensions. It is just a proof of concept, but it shows how it could work. Some edge cases are not handled, and it would need much better testing, but it shows the idea.

If you think any of these advantages are worthwhile and would like to try something like this, let me know and maybe we could work together to make it happen.

— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub https://github.com/alexnathanson/solar-protocol/issues/5, or unsubscribe https://github.com/notifications/unsubscribe-auth/AHBQLOEFRAIK3JVSFKMAMTDTCN5Z3ANCNFSM4YX3VGTQ .

-- Ongoing/ Upcoming: 2021 - Eye Beam Rapid Response Fellowship to develop a network of solar powered servers https://www.eyebeam.org/rapidresponse/ July 2021 - Preorder A History of Solar Power Art and Design (forthcoming Routledge) https://www.routledge.com/A-History-of-Solar-Power-Art-and-Design/Nathanson/p/book/9780367465681

Contact: (201) 306-3473 www.alexnathanson.com www.solarpowerforartists.com

simao commented 3 years ago

Hi!

Yes, I'd be happy to chat about this.

Modulating the communications based on the stored energy does make a lot of sense, but not sure that memberlist supports it, would have to look into it. It can be tweaked to use less network traffic, increasing timeouts at the cost of increasing convergence time. I think something like that would requiring writing a specific gossip protocol tailored for solar-protocol, but would have to check the memberlist api.

Each time a node leaves the network, that failure is detected by nodes that were pinging that node, those nodes will propagate that information to the rest of the network, so the memberlist will eventually converge with the new list of nodes. If the node leaving was the one currently serving the website, this could be problematic because the side would be down for a few seconds until the dns was updated, but we could also implement a more graceful exit, where a node could announce it's exit so the other nodes would have time to determine which other node should update the dns and serve the website.

It would also be possible to make only some nodes responsible for updating the dns and serving the website, while other nodes only serve the website. This would be useful if for example you don't want to give dns api credentials to all nodes.

alexnathanson commented 3 years ago

Hi Simao,

Sorry for the delayed response. If you want to email me directly at alex@alexnathanson.com we can find a time to talk.

Thanks!

samuk commented 1 year ago

Did this conversation ever happen? @simao are you still interested in this topic?

simao commented 1 year ago

No, it did not. Yes I am interested, unfortunately not much time atm.