digidem / osm-p2p-db

Peer-to-peer database for OpenStreetMap data
BSD 2-Clause "Simplified" License
237 stars 25 forks source link

Spatial and date indexes on Changesets #32

Open gmaclennan opened 7 years ago

gmaclennan commented 7 years ago

We need to implement GET /api/0.6/changesets in order to create an interface for reviewing recent changes.

This would require both a spatial index on changesets and a date index. Is there a way to cheaply get an ordered list of changesets? e.g. if we can't rely on clocks being set correctly can we just pull the most recent changesets off the db?

hackergrrl commented 7 years ago

If you just want the last N changesets (where N isn't too big), you can probably get away with just doing something like this:

var through = require('through2')
var collect = require('collect-stream')

osm.log.createReadStream({ reverse: true, limit: 100 })
  .pipe(through(write))
  .pipe(collect(function (err, data) {
    doWithChangesets(data)
  })

function write (node, enc, next) {
  if (node.value.type === 'changeset') {
    this.push(node)
  }
  next()
}

function doWithChangesets (cs) {
  // ...
}

This will give you the last 0-N changesets from the underlying hyperlog. If you want more, you could remove the limit kv and just end the stream once you have as many as you'd like.

hackergrrl commented 7 years ago

Sorry, forgot github refuses to treat emails as markdown. :(

gmaclennan commented 7 years ago

How does the ordering work on log.createReadStream() after syncing? e.g.

  1. Machine 1 creates changesets A, B.
  2. Machine 1 syncs with machine 2
  3. Machine 1 creates changeset C, D
  4. Machine 2 creates changeset E, F
  5. Machine 1 and 2 sync.
  6. What would log.createReadStream() on Machine 1 now return? would we always get A, B before C, D E and F? Would the order C, D and E, F always be guaranteed? I am guessing the ordering of the two pairs [C, D] with [E, F] is not guaranteed?
hackergrrl commented 7 years ago

Yes: the ordering is not stable across machines. A hyperlog's CHANGEs ordering is not globally ordered. However, after you grab the last N changesets, you could sort them prior to presentation according to some sort of deterministic predicate.

gmaclennan commented 7 years ago

Yes to which part? You can't guarantee that E, F would come after A, B in the above case?

hackergrrl commented 7 years ago

Re ordering, you can only assume that parents appear earlier than children. So the CHANGES log on each machine should be:

Machine 1: A B C D E F Machine 2: A B E F C D

(This is the in-order sequence. If opts.reverse is provided, you'll be reading these in right-to-left instead.)

gmaclennan commented 7 years ago

Ok, good, that would be good enough for our needs. Now would the changeset ordering match up with the node/way/relation ordering? i.e. can we assume that for the elements referenced in a changeset, parents appear earlier than children?

hackergrrl commented 7 years ago

It's not required, no. As a consumer of osm-p2p-db, I could write

var nodeA = { type: 'node', id: 'A', lon: 0, lat: 0 }
var nodeB = { type: 'node', id: 'B', lon: 1, lat: 1 }
var way = { type: 'way', id: 'C', refs: [ 'A', 'B'] }
var ops = {
  { type: 'put', doc: way },
  { type: 'put', doc: nodeA },
  { type: 'put', doc: nodeB }
}
osm.batch(ops)

And then, if you read back osm.log.getReadStream() you would see them in the same order they were written. osm-p2p-db is intentionally (I believe) very ignorant about semantics. This could be another good candidate for the osm-p2p-api module we've been talking about.

If we wanted to try and ensure this, osm-p2p-server/api/put_changes.js might be a good place to do the sorting.

gmaclennan commented 7 years ago

Ok, thanks, I think returning the order changesets were written is fine for now.