Open gmaclennan opened 7 years ago
If you just want the last N changesets (where N isn't too big), you can probably get away with just doing something like this:
var through = require('through2')
var collect = require('collect-stream')
osm.log.createReadStream({ reverse: true, limit: 100 })
.pipe(through(write))
.pipe(collect(function (err, data) {
doWithChangesets(data)
})
function write (node, enc, next) {
if (node.value.type === 'changeset') {
this.push(node)
}
next()
}
function doWithChangesets (cs) {
// ...
}
This will give you the last 0-N changesets from the underlying hyperlog.
If you want more, you could remove the limit
kv and just end the
stream once you have as many as you'd like.
Sorry, forgot github refuses to treat emails as markdown. :(
How does the ordering work on log.createReadStream()
after syncing? e.g.
log.createReadStream()
on Machine 1 now return? would we always get A, B before C, D E and F? Would the order C, D and E, F always be guaranteed? I am guessing the ordering of the two pairs [C, D] with [E, F] is not guaranteed?Yes: the ordering is not stable across machines. A hyperlog's CHANGEs ordering is not globally ordered. However, after you grab the last N changesets, you could sort them prior to presentation according to some sort of deterministic predicate.
Yes to which part? You can't guarantee that E, F would come after A, B in the above case?
Re ordering, you can only assume that parents appear earlier than children. So the CHANGES log on each machine should be:
Machine 1: A B C D E F Machine 2: A B E F C D
(This is the in-order sequence. If opts.reverse
is provided, you'll be reading these in right-to-left instead.)
Ok, good, that would be good enough for our needs. Now would the changeset ordering match up with the node/way/relation ordering? i.e. can we assume that for the elements referenced in a changeset, parents appear earlier than children?
It's not required, no. As a consumer of osm-p2p-db
, I could write
var nodeA = { type: 'node', id: 'A', lon: 0, lat: 0 }
var nodeB = { type: 'node', id: 'B', lon: 1, lat: 1 }
var way = { type: 'way', id: 'C', refs: [ 'A', 'B'] }
var ops = {
{ type: 'put', doc: way },
{ type: 'put', doc: nodeA },
{ type: 'put', doc: nodeB }
}
osm.batch(ops)
And then, if you read back osm.log.getReadStream()
you would see them in the same order they were written. osm-p2p-db
is intentionally (I believe) very ignorant about semantics. This could be another good candidate for the osm-p2p-api
module we've been talking about.
If we wanted to try and ensure this, osm-p2p-server/api/put_changes.js
might be a good place to do the sorting.
Ok, thanks, I think returning the order changesets were written is fine for now.
We need to implement
GET /api/0.6/changesets
in order to create an interface for reviewing recent changes.This would require both a spatial index on changesets and a date index. Is there a way to cheaply get an ordered list of changesets? e.g. if we can't rely on clocks being set correctly can we just pull the most recent changesets off the db?