Merkle Tuesday - Githubissues

mafintosh commented 8 years ago

Lets do a community call on strategies for replicating merkle dags

YouTube link: http://youtu.be/DPiOdh8CoS0 Gitter link (chat/ask questions): https://gitter.im/datproject/discussions Etherpad: https://etherpad.wikimedia.org/p/merkle-tuesday

Theme: Strategies for replicating merkle graphs Where: Google Hangouts on Air When: Tuesday, October 13th Time: 1pm EST (17:00 GMT)

We are doing our 1.0 iteration of our merkle graph module, mafintosh/dat-graph and in the last couple of days we've actively been discussing the various ways of replicating merkle dags and generic hash sets on irc. dat-graph implementation is almost done so some community feedback would be appreciated.

Merkle DAG research so far: https://github.com/datproject/discussions/issues/28

Agenda:

Quick introdution to Merkle DAGs
Our requirements and the trade-offs we're willing to make
Discussion of our current dat-graph replication strategy using local graph path enumeration
Discussion of merkle tree / trie strategies

jbenet commented 8 years ago

@maxogden got the msg -- this sounds fun i think i can do it.

I think @diasdavid may want to come too.

See also https://github.com/ipfs/notes/issues/12 -- this is WIP.

daviddias commented 8 years ago

awesome idea :+1:

daviddias commented 8 years ago

Made a WorldTimeBuddy clock -> http://www.worldtimebuddy.com/?qm=1&lid=8,5,14,1819729&h=14&date=2015-10-13&sln=17-18

RichardLitt commented 8 years ago

Added this to the IPFS community calendar, too.

mikolalysenko commented 8 years ago

Will try to make it, currently at a conference in SLC and have to give a talk right after this. But I will try to sit in!

pfrazee commented 8 years ago

Will be in transit /cc @dominictarr

max-mapper commented 8 years ago

YouTube link is here https://www.youtube.com/watch?v=DPiOdh8CoS0&feature=youtu.be, starts in 44 minutes

jbenet commented 8 years ago

This was great! :+1:

mikolalysenko commented 8 years ago

Thanks for inviting me!

max-mapper commented 8 years ago

Anyone who missed it can watch the youtube: https://www.youtube.com/watch?v=DPiOdh8CoS0&feature=youtu.be

or check out the notes from the call:

 __   __  _______  ______    ___   _  ___      _______                                                                                             
|  |_|  ||       ||    _ |  |   | | ||   |    |       |                                                                                            
|       ||    ___||   | ||  |   |_| ||   |    |    ___|                                                                                            
|       ||   |___ |   |_||_ |      _||   |    |   |___                                                                                             
|       ||    ___||    __  ||     |_ |   |___ |    ___|                                                                                            
| ||_|| ||   |___ |   |  | ||    _  ||       ||   |___           https://gitter.im/datproject/discussions                                          
|_|   |_||_______||___|  |_||___| |_||_______||_______|                                                                                            
 _______  __   __  _______  _______  ______   _______  __   __   https://github.com/datproject/discussions/issues/29                               
|       ||  | |  ||       ||       ||      | |   _   ||  | |  |                                                                                    
|_     _||  | |  ||    ___||  _____||  _    ||  |_|  ||  |_|  |                                                                                    
  |   |  |  |_|  ||   |___ | |_____ | | |   ||       ||       |                                                                                    
  |   |  |       ||    ___||_____  || |_|   ||       ||_     _|                                                                                    
  |   |  |       ||   |___  _____| ||       ||   _   |  |   |                                                                                      
  |___|  |_______||_______||_______||______| |__| |__|  |___|                                                                                      

      a         a = hash(b + d)
     / \        b = hash(c)
    b   d       d = hash(e)
   /     \      
  c       e

    a         a = hash(hash(ant.jpg) + b + c)
   / \        b = hash(bunny.jpg)
  b   c       c = hash(cat.jpg)

  a         a = hash(hash(ant.jpg) + b + c)
  |\        b = hash(bunny.jpg + d)
  b c       c = hash(cat.jpg + d)
  |/        d = hash(dog.jpg)
  d

    x    y
  /   \ /
 a     e
 |    / \
 b   d   f
  \ /    |
   c     g
    \   /  
      j
      |
      k
      |
      l

x a b c j k l <--
y e d
f g

c
mad science from mikola:
- spanning tree + preferred path decomposition
- disjointness problem:
    find if A + B have no intersections (are they disjoint)
    its impossible to find this out in less than O(n)
    bad news: you need O(n)
    good news: dumb solution is probably optimal

- merkle dags are functional data structures
  - any functional data structure can be converted to a merkle dag by replacing pointers with hashes (merkleize)
  - functional data structure theory can be applied here

- http://cbor.io/
https://tools.ietf.org/html/rfc7049
https://github.com/diasdavid/node-ipld
- https://github.com/ipfs/go-ipld/issues

- use cases that merkle dags provide a trust framework for:
- replicate entire graph (e.g. dat where you want a complete dataset replica)
- query specific parts of the graph (e.g. ipfs where graph spans multiple galaxies, query flooding)
- the network runs the queries itself (e.g. ethereum where you have turing complete bytecode running in distributed VMs)
- replication of graphs
  1. mafintosh style preferred path replication
  2. decompose graph into sets of nodes and try to sync nodes
    - put node hashes into a trie
    - walk the trie to sync
    - similar to ethereum patricia merkle tree https://github.com/ethereum/wiki/wiki/Patricia-Tree
    - need to tune radix of the trie (because hashes are large)

"verified computing"

shared goals:
    - low level graph comparison api
      - ask if someone has nodes
      - ask for data for nodes
      - ask if someone has nodes that match a graph selection path (jbenet graph query syntax)
    - syntax for graph pathing
      - xpath might be too complicated
      - main use case is e.g. 'a..b' (all nodes between a and b)
      - another one would be e.g. /*/**/[type="file"]
    - data structures that can enable the above APIs (preferred path indexing, node trie indexing)
    - backwards compat on the public APIs
    - incremental friendly indexing & replication for data structures
    - supporting efficient streaming by selective walking of inner & intermediate nodes
    - uncle hashes & munro hashes may help this 

 agree - can think of this as stacked layers GET/SET/HAVE simple functions
 - add additional query layers on top
 - protocol should have simple extensibility
 - viz https://tools.ietf.org/html/rfc5754
 - consider whether the low-level query layer needs to know about graphs or not

 - collect any names / urls of reseearchers and projects (maybe on wiki) to contact
 - I mentioned ICNRG research group from IETF https://irtf.org/icnrg 
 - binmaps www.pds.ewi.tudelft.nl/fileadmin/pds/reports/2011/PDS-2011-005.pdf & https://github.com/gritzko/binmap & http://scattered-thoughts.net/blog/2012/01/03/binmaps-compressed-bitmaps/
 - need solutions / optimisation choices for different retrieval / replication requirements

Where can people leave feedback? 

- i need to index some stuff
- i need to query some stuff
- i have a content addressable store graph
- maybe a REST-like api for the graph would make sense? then you can layer on indexing and querying

- possible inspiration https://en.wikipedia.org/wiki/Border_Gateway_Protocol
- communication complexity notes: http://theory.stanford.edu/~tim/w15/l/w15.pdf
 {value: 'some new data', link: ['deafbeaf'], key: 'aaaabbbb'} -> {value: 'some data', author: 'mathias', key: 'deafbeaf'}

 bob         alice

 a            a         
 🔼           🔼
 b            b
 🔼           🔼
 c            c
              🔼
              d

scenario 1:
bob asks alice -> do you have any of [a,b,c]?
alice reponds -> [0,1,2]
bob asks -> give me everything from [2, infinity]

scenario 2:
bob asks alice -> do you have any of [c]?
alice reponds -> []
bob asks alice -> do you have any of [a]?
alice reponds -> [0]
bob asks alice -> do you have any of [b]?
alice reponds -> [0]
now bob knows alice has a and b but not c

------------------------

dag path notation https://github.com/ipfs/notes/issues/12

- traversals (or selections) on a graph
- XPath

<hash>/*/**/[type="file"]

<hash>/*
<hash>/*/*/*
<hash>/a/b/c/d/e/*

<hash1>..<hash2>

wrt to erlang, data structures are not replicated in any smart sense - just converted to "flat" binary structure. (dch) see http://www.erlang.org/doc/man/erlang.html http://www.erlang.org/doc/apps/erts/erl_ext_dist.html

max-mapper commented 8 years ago

Also check out this blog post: https://gdstechnology.blog.gov.uk/2015/10/13/guaranteeing-the-integrity-of-a-register/

mikolalysenko commented 8 years ago

Okasaki's thesis is a good reference on functional data structures: http://www.cs.cmu.edu/~rwh/theses/okasaki.pdf

The main trick in these things is path copying (aka structural sharing) to reduce memory. Lots of data structures are easy to functionalize, like red-black trees, kd-trees, tries, etc.

kumavis commented 8 years ago

here's null_radix's merkle-patricia-tree implementation

dat-ecosystem-archive / datproject-discussions

Merkle Tuesday #29