datatogether / datatogether

:checkered_flag: Start here! Discussion for Data Together: Building a better future for data
https://datatogether.org/
Creative Commons Attribution Share Alike 4.0 International
46 stars 8 forks source link

Discontinue use of "distributed", instead use "decentralized" or "peer to peer" #32

Open ebarry opened 6 years ago

ebarry commented 6 years ago

In discussion with @flyingzumwalt , it came to light that we use the terms distributed and decentralized interchangeably and sometimes within the same paragraph.

[insert notes and citation from long discussion]

We suggest to discontinue the use of the term distributed, and instead use decentralized or peer-to-peer.

If people agree, perhaps a find and replace throughout this repo might be a good idea?

mhucka commented 6 years ago

Argh, I didn't notice this issue before I wrote my comment, but anyway, my $0.02 on this issue are here: https://github.com/datatogether/datatogether/pull/24#discussion_r147307808

ebarry commented 6 years ago

Awesome! I didn't have time to write out the long discussion that @flyingzumwalt and I had, so thank you for writing out your points longform! Copying in your last paragraph here @mhucka , so we can see what's next:

So coming back to the present document, I think we would be most clear by (1) avoiding the expression "distributed web" unless it's part of a statement like "... using IPFS, the 'distributed web' ...", (2) using decentralized (or decentralized and distributed) to describe how data preservation, security, etc., are achieved in DT, and (3) use peer-to-peer as part of the description of how the infrastructure works but not as a replacement for either distributed or decentralized.

dcwalk commented 6 years ago

Hey all,

I was thinking on this last night. Is there any chance to unpack the distributed v. decentralized a bit? Mostly curious, as I have come across is separately in literature (beyond just academics researching decentralized tech), and maybe it does make sense to me in a way it isn't for others?

mhucka commented 6 years ago

@dcwalk is right as usual. I have some more thoughts about this and will try to write up asap.

mhucka commented 6 years ago

After some more research and thought, I want to modify my original statements a little bit and unpack some distinctions better.

With respect to the term distributed, I understand now that I was fixated too much on network architecture when I wrote https://github.com/datatogether/datatogether/pull/24#discussion_r147307808. That view is too narrow; distributed is really one end of a spectrum in which centralized is the other end, and this is a dimension that can be used to describe not just architecture but other things as well (including things outside of computing -- which I know you all know). So in other words, we can talk about different things being (say) fully centralized, partly centralized, fully distributed, etc. -- things like services, resources, etc. Outside of computing, examples include water distribution networks and electrical power distribution networks.

I think IPFS's tag line "distributed web" is meant more as a reaction to the centralization of services and resources that exists in today's web. They want to avoid the concentration of essential services and resources (and thus control) in a few powerful entities. The tag line is perhaps an attempt to highlight the goal of moving away from consolidation and centralization, and maybe less a statement about the network architecture of the web (which is how my old-computer-geek brain interpreted it at first).

With respect to peer-to-peer, I feel on reasonably solid ground to claim that in computing contexts, it's generally taken to be an approach to distributed computing but technically not necessarily synonymous with decentralization. The reason is that there are examples of partly centralized P2P systems. (See, for example, section 2.1.1 in the 2010 book Peer-to-Peer Computing, by Vu, Lupu and Ooi.) Discussions get muddy because the very term "peer-to-peer" implies that peers have some kind of equal stature or roles in the system, but that can be limited to the network architecture. Now, it wouldn't make sense to talk about P2P without a distributed architecture, so P2P is inherently distributed. It's just that control can be centralized in a P2P system, although most modern systems do seem to adopt a more decentralized approach. (I do think it's pretty common to assume that P2P implies a distributed and decentralized system.)

Going back to the text and specifically talking about "distributed web" in Data Together, I'm a little torn. Although IPFS's use of "distributed web" makes some sense, (1) I feel that to say we're working towards decentralization is a slightly more evocative statement about Data Together's goals (c.f. the Wikipedia entry for decentralization), and (2) I still have a nagging feeling that "distributed web" might not be meaningful enough to a lot of people. But other people may not share my feeling about #2, and/or I might be way off base with the whole analysis, and so I'll be happy to follow whatever consensus emerges on this issue.