digidem / osm-p2p-db

Peer-to-peer database for OpenStreetMap data
BSD 2-Clause "Simplified" License
237 stars 25 forks source link

deterministic keys for osm data when importing #59

Closed sethvincent closed 6 years ago

sethvincent commented 6 years ago

Debugging the issue we've had with exporting observation data, I found that the primary issue was that the osm-p2p key for the same OSM node would be different on the desktop and mobile apps.

This issue came up because we are doing an import from xml in two separate apps, and create is assigning random values for the key of the same data.

In the create function, instead of:

var key = hex2dec(randomBytes(8).toString('hex'))

We could do something like

var key = hex2dec(Buffer(value.id).toString('hex'))

Where value.id is the id from OSM, so that no matter who creates the data the key is always the same.

There are situations where no id is present, so we could do something like this:

if (value.id) {
  key = hex2dec(Buffer(value.id).toString('hex'))
} else {
  key = hex2dec(randomBytes(8).toString('hex'))
}

An alternate solution could be that when importing we use put and provide the key. But it seems like having create handle existing OSM data by using its id would be reasonable.

I'd also be tempted to just use the OSM id in its original form when importing instead of a new key or a hash of the id. I'm not sure if there's a reason not to do that.

hackergrrl commented 6 years ago

Nice catch @sethvincent. What you're saying makes sense to me. The ideal would be to do a proper hyperlog#replicate with the OSM.org data, but that's still prohibitively slow. It might not be soon though with @staltz's recent LevelDB work on mobile!

I'd also be tempted to just use the OSM id in its original form when importing instead of a new key or a hash of the id. I'm not sure if there's a reason not to do that.

This sounds reasonable to me. As long as the IDs are unique everything will work.

hackergrrl commented 6 years ago

Closing this for now: this isn't an osm-p2p-db bug, but relates to the custom sync code we wrote in https://github.com/osmlab/field-data-collection/ and https://github.com/osmlab/field-data-coordinator/