caged / osm-building-import

The old college try to prepare and import Portland, OR building data into OpenStreetMap.
MIT License
2 stars 0 forks source link

state_id numbers are inconsistent between buildings and addresses #3

Open caged opened 9 years ago

caged commented 9 years ago

In the RLIS addresses dataset, most Portland addresses are formatted with leading zeros and a dash. However, the RLIS building dataset discards all the leading zeros and the dash.

Example

1N1E34DB -00100 - RLIS address data
1N1E34DB  100 - RLIS building data
name records where state_id/tlid ~ \s-0
buildings 3882
addresses 238671

Proposed solution Create a postgres function that normalizes state_id/tlid across buildings and addresses data. This function should be used when processing state_id/tlid for storage during create operations and should include the whitespace processing already done in #2. Specifically, all "finalized" tables should be processed in this manner.

caged commented 9 years ago

This only seems to be an issue with Portland proper addresses; and not all of them suffer from it.

Blue - All buildings Green - Buildings that have a state_id (tlid) match in addresses

state-id

state-id-detail

caged commented 9 years ago

After landing #5, specifically https://github.com/rosecitygis/osm-building-import/blob/master/sql/_normalize_state_id.sql, this is in much better shape. We now have 595,123 buildings matched with addresses. That's almost the entire dataset.

state-id-normalized

state-id-normalized-detail

I'm going to leave this open because I'd love to get some guidance from some community OSM members and folks over at Metro on whether this should be considered resolved. I've done some spot checking and things appear to be ok.

geografa commented 9 years ago

Are you proposing this be the building_id or just want to retain it? I'd propose we use the building ID and use that to maintain links back to RLIS for future changes like building removal or additions. Using a minimum set of tags is the preference, I believe.

caged commented 9 years ago

Are you proposing this be the building_id or just want to retain it?

There are a few different ids, so this can be a little hard to keep straight. There is an existing bldg_id (which is actually composed of the state_id) [1], so that takes care of the building id.

state_id (in buildings) and tlid (in addresses) are used to match buildings and relevant addresses together. The process looks like this:

There are other issues to consider beyond this that I think we should tackle in a different issue. The case of multiple addresses, for instance.