dominictarr / scuttlebutt

peer-to-peer replicatable data structure
Other
1.32k stars 66 forks source link

how to use it with peer to peer network settings(not client server)? #35

Closed shimondoodkin closed 10 years ago

shimondoodkin commented 10 years ago

how to use it with peer to peer network settings(not client server)?

is there such thing?

shimondoodkin commented 10 years ago

found this myself. https://github.com/tristanls/gossipmonger not sure if its good

dominictarr commented 10 years ago

scuttlebutt doesn't directly implemente anything to do with networking topology. I created this though: https://github.com/dominictarr/peers it's pretty simple, although I never used it for anything. are you just doing p2p within the local network/data center or across the internet? what sort of application is it?

shimondoodkin commented 10 years ago

i want to create a redundant data collection and processing system. like i can restart a server and all will still work. i want the components of the system to auto discover each other and connect appropriately:.

in a single each block system there are: announcer -> db inserter -> data processor-> db inserter. but in real system each component is redundant.
there is also a client that can show charts.

db inserters also should be able to figure out who is master only after master dies the second inserter should start inserting. if second dies the third will do the inserting. the announcers connect to all the db inserters available,

the db inserter checks if it has been already inserted such message hash before. so only a single message goes into db.

i have one cloud vps but i will have two servers when i will want to restart the 1st server. i run could run several instances of same applications on a single server.

the components communicate with each other using zeroMQ. a friend told me about scuttlebutt and i was interested,

i wanted to use scuttlebutt to have a list of available components(list of types and addresses with port). so when a components starts it could connect to its destination and announce its sources. that this component has been started. (after writing this seems i can get away with distributed events and without a list)

i wrote the code below and then i suddenly understood there is no peer to peer in this. LOL if server goes down it will stop working. what this program does is it tries to create a server if it fails it creates a client


zmqport = 'tcp://127.0.0.1:7845';

function getExternalIp() {
    var ifconfig = require('os').networkInterfaces();
    var device, i, I, protocol;

    for (device in ifconfig) {
        // ignore network loopback interface
        if (device.indexOf('lo') !== -1 || !ifconfig.hasOwnProperty(device)) {
            continue;
        }
        for (i=0, I=ifconfig[device].length; i<I; i++) {
            protocol = ifconfig[device][i];

            // filter for external IPv4 addresses
            if (protocol.family === 'IPv4' && protocol.internal === false) {
                //console.log('found', protocol.address);
                return protocol.address;
            }
        }
    }
    console.log('External Ip Not found!');
    return '127.0.0.1';
}

var GossipObjectModel = require('gossip-object');
var net   = require('net')
zmq_gossip_list = null;
zmqsignup_start=function()
{
    zmq_gossip_list= new GossipObjectModel()
    var netserver=net.createServer(function (stream){  stream.pipe(zmq_gossip_list.createStream()).pipe(stream) 
        zmq_gossip_list.on('error', function (){ stream.destroy() })
        stream.on('error', function () { zmq_gossip_list.destroy()})
    });
    netserver.on('error', function (e) {
      if (e.code == 'EADDRINUSE')
      {
        console.log('Address in use. Connecting instead');
        var stream = net.connect(8477);
        stream.pipe(zmq_gossip_list.createStream()).pipe(stream);
        zmq_gossip_list.on('error', function (){ stream.destroy() })
        stream.on('error', function () { zmq_gossip_list.destroy()})
      }
    });
    netserver.on('listening', function (e) {
      console.log('zmq_gossip_list server bound');
    });
    netserver.listen(8477);
}

zmq_gossip_list_all_mine=[];
zmqsignup_addme=function(mytype)
{
 var exip=getExternalIp()
 var name=exip+'_'+process.pid
 //zmq_gossip_list.set([mytype,name], "")
 zmq_gossip_list.set([mytype+'_'+'zmqport',name], zmqport.replace(/\*|127.0.0.1/,exip)) ; zmq_gossip_list_all_mine.push([mytype+'_'+'zmqport',name]);
 zmq_gossip_list.set([mytype+'_'+'lasttime',name], new Date().getTime())              ; zmq_gossip_list_all_mine.push([mytype+'_'+'lasttime',name]);
 zmq_gossip_list.set([mytype+'_'+'status',name], 'ok')                                  ; zmq_gossip_list_all_mine.push([mytype+'_'+'status',name]);
}

zmq_gossip_remove_all=function()
{
   for(var i=0;i<zmq_gossip_list_all_mine.length;i++)
       zmq_gossip_list.delete(zmq_gossip_list_all_mine[i]);
}

zmqsignup_start();
setTimeout(function(){zmqsignup_addme('tester')},300);

    function handleexit(cb)
    {
      zmq_gossip_remove_all()
      setTimeout(function(){  if(cb)cb(); },300)
    }

    process.on('SIGTERM', function () {
     console.log('Got SIGTERM, will exit in 10 seconds ');
     var num=1;
     var n=setInterval(function(){ console.log(num);   num++;  },1000); 
     var c=setTimeout(function(){ if(n)clearTimeout(n);   process.exit(0);  },10000)     
     handleexit(function(){
      if(c)clearTimeout(c);
      if(n)clearTimeout(n);
      process.exit(0);
     })
    });

    process.on('SIGINT', function () {
     console.log('Got SIGINT, will exit in 10 seconds ');
     var num=1;
     var n=setInterval(function(){ console.log(num);  num++;  },1000); 
     var c=setTimeout(function(){ if(n)clearTimeout(n);  process.exit(0);  },10000) 
     handleexit(function(){
      if(c)clearTimeout(c);
      if(n)clearTimeout(n);
      process.exit(0);
     })
    });

var  repl = require("repl");repl.start({ useGlobal:true,  useColors:true, });
shimondoodkin commented 10 years ago

found this as quire resilient p2p. https://github.com/automenta/telepathine/

dominictarr commented 10 years ago

aha, okay I implemented a thing like that, that if the server goes down it creates a client, https://www.npmjs.org/package/autonode each node first tries to create a server, but if they get an error they become a client, if they are the client but the server is down, then they became the server...

for discovering the processes on other machines on the local network I think the simplest way would be to use udp heartbeats. I have a module for that too :) https://github.com/dominictarr/broadcast-stream

however, I can't seem to get it to broadcast udp to the same process (because only one process can open a bind to a given port...) so you'd just have the server become the process that is responsible for the UDP.

automenta commented 10 years ago

i was thinking of using bittorrent DHT to find peers. tried this with some node.js bittorrent modules but didn't get far. also see my fork of the kademlia module (incomplete). if anyone could test telepathine which is just a recent experimental fork of grapevine / gossip, and let me know how it works for them, i can only test a limited # of cases.

originally i experimented with using dominictarr's scuttlebutt module across the smokesignal module. this might still be possible. i like telepathine's ability to use UDP for small messages, and start a TCP stream if they exceed a certain size threshold (~517 bytes is safe for internet, iirc up to 64k if communicating within a LAN). it should be possible to use dominictarr's scuttlebutt in place of the scuttlebutt in telepathine which comes directly from grapevine without change. but it may not be necessary to always use streams, when control messages can be sent as UDP.

https://github.com/automenta/telepathine/blob/master/lib/scuttle.js

another module in this series if telepathdb which is connecting pouchdb (leveldb internally) to each other via telepathine synchronization. not sure if this is useful since pouchdb has its own replication protocol which can even synch between web browser and node.js.

finally here's an idea (so far just a description) for replacing NPM with something decentralized and real-time: https://github.com/automenta/npn

dominictarr commented 10 years ago

Okay this sounds very interesting. Although there is a lot going on here. On the one hand, it sounds like @shimondoodkin is talking about building something kinda like hadoop. A database system that runs within a datacenter (distributed). and @automenta is talking about something that is truly p2p (decentralized). A system that runs across many computers is generally called a distributed system, usually they are all under one organizations control, so all the machines are trusted. On the otherhand, something like a p2p npm is a lot more challenging. This sort of system has not been researched as in depth as a merely distributed system, but there are a number of successful examples out there, like bit-torrent and bitcoin.

I'm working on a decentralized, secure gossip protocol at the moment, https://github.com/dominictarr/secure-scuttlebutt it's not quite ready yet, but the basics are there. It works like a big version of scuttlebutt, but peers can replicate overlapping datasets. Each node gets an append only feed, where each contains a hash pointing to the previous message, and is signed with that node's private key.

I think you could use this to build a decentralized package manager, as well as a bunch of other things, probably.

automenta commented 10 years ago

secure-scuttlebutt looks awesome. is there a live network i can join yet? will clone it soon and see what it does.

dominictarr commented 10 years ago

No, not yet. so far I've focused on replicating on the local network & verifying the database. (pass data on to other nodes when you are on same network) Since it's all replication based, you could still totally use it like this, exchanging data via local networks, by carrying it around by foot (!)

Next, I'm gonna refactor/simplify some stuff, and then I'll add internet support.

shimondoodkin commented 10 years ago

automenta can you allow issues on telepathine repository, i try to use it now.

shimondoodkin commented 10 years ago

i think p2p npm is solvable. using existing technology like torrent and dht maybe kadmelia to require certien geographic distribution and number of copies

shimondoodkin commented 10 years ago

aactually maybe not implement torrent at all controll a c commandline client

automenta commented 10 years ago

@shimondoodkin issues enabled & gave you full access, thanks.

npn goes further than being just a p2p package manager. it would be easy just to archive all of the npm data in bittorrent and download it (one way is to use a bittorrent HTTP proxy or PeerCDN https://peercdn.com/ if that ever gets released). but npn suggests modeling people as packages themselves - humans and software (build automation tools) which produce software, etc.. along with that, all of the development activity of issues, releases, etc can be expressed as realtime asynchronous messages - making github (at least its web system like we're using now) seem more or less redundant and slow. imagine pushing code changes via scuttlebutt immediately to deployed targets. a deployment could be either a production application running on a server, or someone who has cloned your repository for development and testing - really there is no difference from your perspective, but the current set of development tools (git, npm, etc..) create these artificial distinctions. let me know if that makes sense - npn is still a very raw idea.

dominictarr commented 10 years ago

I have also been thinking about making a p2p npm, but really npm is just a database with a particular structure. in someways it's like a wiki; a dependency is like a link. As in a wiki, you link to something that may change over time because the page will be edited. The difference between a wiki page and a package, is that normally there is much more controled permissions on who can update a package. I think you can look at a bunch of other things, i.e. blog, forum, review site, as being somewhere on the continum between a wiki (open permissions) to a package manager (strict permissions)

git is a great example of how you could build a structured decentralized database, here is my take on that: https://github.com/dominictarr/cyphernet (i started working on this project, but I back burnered it, because I think it would be more useful to have secure-scuttlebutt first)

camlistore was an important influence, https://camlistore.org/ except I want to have data replication as a primitive.

A decentralized distribution of static files is a solved problem (bittorrent), but a structured database is still an open one.

That said, secure-scuttlebutt isn't trying to be that, it has only a very simple replication model, and a very simple permissions model (each node may only append to their own feed). I am building this first, and then maybe I'll get to the "wiki" thing later.

dominictarr commented 10 years ago

@automenta what do you mean by "modelling people as packages"?

automenta commented 10 years ago

trying to define "packages" (maybe another term is better, like "project") as "software" that potentially generate other "software". most software is just an end result but some software generates software, like build systems and evolutionary algorithms. consider human DNA is like "software" that can generate "software" (documents, computer code, new humans) so the analogy fits. likewise we can collapse organizations as just collection of software - whether it's humans, or computer software, or a combination.

dominictarr commented 10 years ago

okay but how do you represent that in the system? you do you address a project/person/software-generator?

Floby commented 10 years ago

I had done some work as well on npmd-peers. It was meant as a peer discovery mechanism for npmd.

I used udp broadcasting with "polo" and bittorrent trackers. The next step was to use a dht as kademlia or bittorrent dht to get rid of the tracker part.

Exchanging data had not been addressed, but I assumed there would be some gossiping in play at some point. But this might as well be simple couchDB replication. Le 19 juin 2014 17:57, "Dominic Tarr" notifications@github.com a écrit :

okay but how do you represent that in the system? you do you address a project/person/software-generator?

— Reply to this email directly or view it on GitHub https://github.com/dominictarr/scuttlebutt/issues/35#issuecomment-46579626 .

automenta commented 10 years ago

@dominictarr each project should probably have a UUID, plus metadata which includes all of the current npm details, plus more. the relationships between projects would provide a verifiable trust network and dependency / progenitor directed-acyclic graph. for example, both dependency, fork, and rating edges could increase trust. to address a project by name would be like a query in a DHT which resolves to UUID's, but a direct UUID (like a magnet:// URI) is most direct and unambiguous. the results of a name query could be ranked by trust in which case you would find the most established project listed first, as determined by number of dependents, progenitors, etc.

probably some combination of blockchain, git, and DHT can implement this.

i help develop a system for using wikipedia as a foundation ontology for tagging to generate partial matches. this would apply to people as well as software: http://www.ingenesist.com/general-info/curiosume-integrating-social-innovation.html Curiosume would allow a project to apply for work (your application) much the same way as a human applies for employment. it includes geolocation so that could be another kind of query, to find nearby developers - of course this applies not just to software but any kind of effort.

automenta commented 10 years ago

one other note about Curiosume: it can be used to tag individual source code files to indicate what qualifications a developer should have to best work on it. example at the top of this file: https://github.com/automenta/netjs/blob/master/server/web.js these can be aggregated in a database like NPN to attract developers to the individual parts of the projects which are most relevant to their skillset. tags like this can also be applied to CAD designs, legal docs, etc..

automenta commented 10 years ago

@Floby thanks! i totally missed npmd. that's a great starting point for npn. here's a fork of kademlia i started on before forking grapevine into telepathine. https://github.com/automenta/kademlia/commits/master i forget exactly what i added or changed but the commits should explain. anyway i dont think it presently functions for WAN discovery like bittorrent DHT but that was one of the features i wanted.

dominictarr commented 10 years ago

@automenta how do you plan to implement write permissions? how do you plan to address the sybil attack?

automenta commented 10 years ago

@dominictarr i suppose you would only be able to write into your own or group repositories, like git. this would depend on the storage system's permission model. have you seen http://maidsafe.org ? not sure if it's overkill being a rather complex framework. maybe git repository should be the first implementation. npm already works transparently with git / github. a bittorrent-like DHT for github resources can accelerate transfer https://code.google.com/p/gittorrent/ .

a global rating system like github or npmjs.org are vulnerable to sybil attack to the extent that it's easy to create new accounts - but a decentralized trust graph should not. a cluster of self-reinforcing sybil identities would be the equivalent of link spamming in a search engine. it would only be as trustworthy as its graph neighborhood. as for how much computation is necessary to calculate a global trust graph, i don't have any estimates. certainly some kind of peer caching is needed.

https://github.com/automenta/netjs/blob/master/server/web.js#L1087 this is some code for a trust network within a decentralized social network. i dont have any data on how well this scales. it computes graph distance to remote peers and the trust level is inversely proportional to distance. self = 0 distance = infinite trust; 1-hop trust = 1 distance = trust 1.0. 2-hop indirect trust = 0.5 trust, etc. trust can be in proportional amounts (0..100%) and there can also be a distrust network operating in parallel which can be summed independently. in netention, if one is trusted above a certain threshold, then they can receive objects you publish in the "trusted" scope. (public scope is always accessible, private scope never). trust is asymmetric, modeled in a directed graph, from one person to another; it's only symmetric when both people decide to trust each other.

a "value" network can be computed the same way, instead of trust links, representing value (the result of a +1 or like). likewise, a dislike network.

dominictarr commented 10 years ago

Ah very good. A decentralized reputation/trust network is what I'm planning on building with scuttlebutt, basically. That is a bit unfun at first, but people like following/feeds etc, and a trust network falls out of that.

To be honest I'm a bit skeptical of many of these bitcoin2.0 things. maidsafe makes a lot of bold claims but I can't find a straightforward explaination of how it is intended to work... It seems to me like a case of "if the old tool you have is a block-chain everything looks like a cryptocurrency".

On the other hand, bitcoin is brilliant and most importantly simple, the bitcoin paper is only a few pages. Maidsafe has 6 whitepapers... And I can't tell whether there is actually a working prototype or not. The bitcoin paper and bitcoin itself was released at the same time. I think that is the better approach for a cryptosystem with bold claims.

have you heard of tahoe lafs? it's a distributed filesystem that is actually implemented, and has a very clever permissions system. https://tahoe-lafs.org/trac/tahoe-lafs

dominictarr commented 10 years ago

I think with ambitious potentially world changing ideas like a distributed package manager you should not attack the grand vision in one go - I've been much more successful building out more primitive modules that form a part of the big picture but are useful on their own. I can't stress this enough. That way you get collaborators and feedback and generate value incrementially...

automenta commented 10 years ago

thanks for the suggestions @dominictarr . i think secure_scuttlebutt features could directly plugin to netention. demo server running http://geekery.biz (anonymous login works). another way of looking at it, maybe the text-UI in secure_scuttlebutt for monitoring the network is the beginning of a console interface to netention. currently using the telepathine module in the p2p plugin to synchronize server and roster data but not using its full potential yet. screenshots:

_a_123_a_ 11 _a_123_a_ 7 _a_123_a_ 6 _a_123_a_ 4 _a_123_a_ 3

automenta commented 10 years ago

GitChain - https://www.kickstarter.com/projects/612530753/gitchain

dominictarr commented 10 years ago

@automenta I don't really understand what your thing is ment to be doing. it seems to have a bunch of discussion things in a bunch of visualizations, but it seems to be filled with random/test data so it doesn't make much sense to me right now to be honest. What problem does this solve and how does it relate to p2p?

automenta commented 10 years ago

project description: https://github.com/automenta/netjs/blob/master/doc/README.technical.md

yes it is quite abstract since we're trying to achieve several goals at the same time. it solves the problem of being able to describe and communicate semantic qualities of reality and imagination unambiguously so that goals (described imaginary futures) can be realized in terms of what already exists. so essentially it shares semantic objects (JSON serialization is fine) between clients <-> servers, and servers <-> servers. these can be compared to generate matches and relevant data according to what users are focused on. there are several documents and presentations in the doc/ folder too.

shimondoodkin commented 10 years ago

i have developed this: https://github.com/shimondoodkin/nodejs-p2p-dist-components i use it.

there is nsq - a distributed queue.

i use telepathine for p2p discovery, for now i use it to switch instances on a single machine. using my precise future master switch.

i use it with zmq. zmq lets me do real time and ordered input in contrary to nsq but i don't have caching and pipelining as nsq has. it is kind of very good enough for me.

i use mongodb for database which is distributed. each node connects to database by itself

thanhthang20 commented 10 years ago

Hi @shimondoodkin ,

It is some kind of complexity and dependency. Please help to share some example code in this case:

I have 3 socket.io servers horizontally scaled (different machine in different location), lets say A, B and C. A web client sends 'hello' to server A and I want ALL clients to hear this message - regardless if they are connected to the socket.io on A, B or C. Could "nodejs-p2p-dist-components" power this system?, thanks!

shimondoodkin commented 10 years ago

i think you don't need it for socket.io, socket io has redis backend. have you tried it?

On Thursday, October 30, 2014, thanhthang20 notifications@github.com wrote:

Hi @shimondoodkin https://github.com/shimondoodkin ,

It is some kind of complexity and dependency. Please help to share some example code in this case:

I have 3 socket.io servers horizontally scaled (different machine in different location), lets say A, B and C. A web client sends 'hello' to server A and I want ALL clients to hear this message - regardless if they are connected to the socket.io on A, B or C. Could "nodejs-p2p-dist-components" power this system?, thanks!

— Reply to this email directly or view it on GitHub https://github.com/dominictarr/scuttlebutt/issues/35#issuecomment-61098421 .

thanhthang20 commented 10 years ago

Pub/sub in redis backend and/or load balancer can solve this problem but both of them are single point of failure. Imagine in future, we can build a high performance system with thousand and thousand of tiny nodejs instances in every where without single point of failure. I think P2P technology is the right way to do that.

shimondoodkin commented 10 years ago

@thanhthang20 ok , i guess it wont be able to sync all messsges across all servers this is huge bandwidth and it does not scales well. but generally it is possible.

telepathine has p2p discovery. i used it, after a connection to p2p i have sent events to all that a new service is up. so they connect together. you may use my system or anoter to inter connect the servers at the backend.

in general what telepathine does is it remembers a list of past connections. and sends it on connect and when someone is connected he sends to everybody his esternal ip with port. you could implement this with anything.

a p2p network is everybody is connected as clients(bidirectional communication) to everybody.