digidem / hyperdb-osm

Peer-to-peer OpenStreetMap database over hyperdb.
24 stars 7 forks source link

Separate type for observations? #2

Open gmaclennan opened 6 years ago

gmaclennan commented 6 years ago

For osm-p2p we are introducing the concept of "observation", which is the location of a person making an observation about a place. The location of an observer may not be identical to a place (e.g. standing on a road looking at a building) and a place could have multiple observations over time. The observations could end up forming the geometry of a place e.g. a user taking points or a track along a path or a boundary. The geometry of a place could change over time based on observations.

We could model observations as nodes in a relation, or we could store them as a separate type. I open this discussion for evaluating the pros and cons of each. To kick us off:

okdistribute commented 6 years ago

What is the use case for an observation point?

What users would they serve, i.e., which users are interested in analyzing those observation points?

How far in distance would they be from the nodes they are observing (on average)? Are we talking a few meters or dozens of meters? Or could I make an observation from an office a continent away?

How often would these be changed into a 'real node'?

hackergrrl commented 6 years ago

What if observations were modeled as basic OSM elements, but lived on a separate map layer to be well-partitioned from regular physical geographic data?

gmaclennan commented 6 years ago

The goal of observations is to separate subjective statements (“I was in this location and I observed this thing about this place”) from edits to a featured attributes, which is more of an objective statement. E.g. editing the name of a cafe vs saying “I was here on the street outside the cafe on this date/time and I saw the name was X”. A cafe could change its name, you can look and see where the current statement of truth (the attribute edit) came from (the observation). A observation is different from an attribute edit because you could make several attribute edits for many reasons other than information from the ground e.g spelling corrections or interpretations of observed data. E.g. many observations about a sacred mountain recording different versions of the story about the mountain from different elders, but then creating an edited version of that story from all of those and adding that edited story as an attribute.

The other goal of observations is to avoid edits of elements in a distributed environment. By making many of what would be edits of a feature an observation record linked to the feature, we reduce the probability of forked data. E.g. two people visit an oil contaminated site separately. One person adds some observations points of how the contaminated area has expanded, another makes an observation of a company clean up team. These are linked to the place (the contaminated site) rather than trying to store that data as attributes of the place.

For the consumers of the map data:

For the reader of the map the observations are useful for knowing the source of edits: where did this data come from? Often there is an interpretive step between a field observation and what goes on the map. Many of the GPS points gathered are not precisely at the location of the place e.g a school building that someone located with a observation (with GPS point) from the path beside the school. The editor will take that gps point along with info from the observer (I was 50 ft east of the school when I took this point) or satellite imagery, and use that to draw the actual location. Observations allow the user to see the source data of this process, which can be useful for justifying te map.

This same info is also very useful for the maintainer of a map like this. You often start putting things on the map on te basis of potentially vague data, and then triangulate with verification on the ground. The maintainer needs to know for something on the map “did someone visit this and is the location or is this based on a sketch map”, or they might see something on the map that perhaps is not permanent and want to know the last time someone visited that location and confirmed it was still there.

Finally for the data collectors, when they visit a place, especially when doing monitoring, it is useful to know what has been seen before around a place, and update with more information about what is there now.

One way of thinking about observations is that they are like reviews of a place, like reviews in yelp, except we are recording the location of the reviewer to show they were actually there (or nearby) when they made the review. A more recent review of your local cafe is often more relevant than a review from 5 years ago. This is why I have been thinking about modeling them differently, the same as you would model a review as a different type to place if you were making Yelp.

On Feb 19, 2018, at 2:31 PM, Stephen Whitmore notifications@github.com wrote:

What if observations were modeled as basic OSM elements, but lived on a separate map layer https://github.com/digidem/hyperdb-osm/issues/3 to be well-partitioned from regular physical geographic data?

— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub https://github.com/digidem/hyperdb-osm/issues/2#issuecomment-366772693, or mute the thread https://github.com/notifications/unsubscribe-auth/AARumdAO5ZjSMFdwt0_NKAQ7MNl3jL_Uks5tWb4XgaJpZM4SJrCR .

hackergrrl commented 6 years ago

Really well explained Gregor -- this write-up would be a great resource to have on the observations repo (once there is one)!

okdistribute commented 6 years ago

Awesome thanks for the info. Sounds like it does make sense to model them differently, as these observations will certainly have different attributes. I wonder how it can be accomplished most simply -- a log (hyper(core/log/db)) that references a node?

hackergrrl commented 6 years ago

Our original implementation[1] structured the data by having two new types: 'observation' (the observation itself), and 'observation-link' (separate type that nodes the observation's id and the linked node's id).

[1] https://github.com/digidem/osm-p2p-observations

I'm not sure we need both. Storing the node that's linked to in the observation still keeps the database normalized (in the 3NF[2] sense). The only restriction is that a single observation couldn't refer to multiple elements. But I think that's OK?

[2] https://en.wikipedia.org/wiki/Third_normal_form