bellingcat / osm-search

A user friendly way to search OpenStreetMap data for features in proximity to each other.
https://osm-search.bellingcat.com/
MIT License
153 stars 6 forks source link

Using osm2pgsql flex output #1

Open joto opened 10 months ago

joto commented 10 months ago

With modern osm2pgsql you don't need all the postprocessing SQL files. All that is done in there can be done by creating a config file for osm2pgsql that creates the data in the database in the format you need from the beginning. This is not only easier, it should also be a lot faster.

Here is a config file that should get you one table called planet_osm with everything in it very similar to what you had before:

local dtable = osm2pgsql.define_table{
    name = 'planet_osm',
    ids = { type = 'any', id_column = 'osm_id', type_column = 'osm_type' },
    columns = {
        { column = 'tags',  type = 'hstore' }, -- or use jsonb!
        { column = 'way',  type = 'geometry', not_null = true },
    },
    indexes = {
        { column = 'tags', method = 'gin' },
        { column = 'way', method = 'gist' }
    }
}

local function process(object, geometry)
    dtable:insert({
        tags = object.tags,
        way = geometry
    })
end

function osm2pgsql.process_node(object)
    process(object, object:as_point())
end

function osm2pgsql.process_way(object)
    if object.is_closed then
        process(object, object:as_polygon())
    else
        process(object, object:as_linestring())
    end
end

function osm2pgsql.process_relation(object)
    local t = object.tags.type
    if t == 'multipolygon' or t == 'boundary' then
        process(object, object:as_multipolygon())
    end
end

To import use

osm2pgsql -d osm -O flex -S config.lua planet-latest.osm.pbf

As an added benefit, you can use osm2pgsql to keep this database up-to-date, something that wasn't possible before due to the postprocessing.

loganwilliams commented 7 months ago

@joto Thank you for this, looks to be a huge improvement. I will implement this when I do the next database update.

loganwilliams commented 7 months ago

@joto This works great. To be able to update the database, I'd add --slim to the arguments and then run something like osm2pgsql-replication update --verbose -- --output flex --style config.lua?

joto commented 7 months ago

@loganwilliams Yes. If you have version 1.9.0 (or above) of osm2pgsql you don't even need any extra parameters to the osm2pgsql-replication command, because osm2pgsql remembers the settings from the import.

loganwilliams commented 6 months ago

Hm, I'm seeing significantly degraded performance when testing queries against the planet_osm table that this produces vs. the planet_osm view that I was using in earlier versions. I wonder why this could be? Query planner:

old:

 Limit  (cost=257805.96..401022.30 rows=100 width=80)
   ->  Unique  (cost=257805.96..10217070.66 rows=6954 width=80)
         ->  Nested Loop  (cost=257805.96..10216526.84 rows=54382 width=80)
               ->  Gather Merge  (cost=257805.28..265904.59 rows=69542 width=127)
                     Workers Planned: 2
                     ->  Sort  (cost=256805.26..256877.70 rows=28976 width=127)
                           Sort Key: (st_centroid(planet_osm_point.way)), ((planet_osm_point.tags -> 'name'::text)), (st_y(st_transform(st_centroid(planet_osm_point.way), 4326))), (st_x(st_transform(st_centroid(planet_osm_point.way), 4326)))
                           ->  Result  (cost=42.20..254657.76 rows=28976 width=127)
                                 ->  Parallel Append  (cost=42.20..254657.76 rows=28976 width=127)
                                       ->  Parallel Bitmap Heap Scan on planet_osm_point  (cost=4778.71..251871.97 rows=28706 width=125)
                                             Recheck Cond: ((way && '0103000020110F00000100000005000000769F620324CD5FC1F38C35A62D1A5341769F620324CD5FC129634C157A8054413DA2C93BD5175DC129634C157A8054413DA2C93BD5175DC1F38C35A62D1A5341769F620324CD5FC1F38C35A62D1A5341'::geometry) AND (tags ? 'shop'::text))
                                             ->  Bitmap Index Scan on planet_osm_point_way_tags_idx  (cost=0.00..4761.49 rows=68894 width=0)
                                                   Index Cond: ((way && '0103000020110F00000100000005000000769F620324CD5FC1F38C35A62D1A5341769F620324CD5FC129634C157A8054413DA2C93BD5175DC129634C157A8054413DA2C93BD5175DC1F38C35A62D1A5341769F620324CD5FC1F38C35A62D1A5341'::geometry) AND (tags ? 'shop'::text))
                                       ->  Parallel Bitmap Heap Scan on planet_osm_polygon  (cost=42.20..2243.33 rows=324 width=292)
                                             Recheck Cond: ((way && '0103000020110F00000100000005000000769F620324CD5FC1F38C35A62D1A5341769F620324CD5FC129634C157A8054413DA2C93BD5175DC129634C157A8054413DA2C93BD5175DC1F38C35A62D1A5341769F620324CD5FC1F38C35A62D1A5341'::geometry) AND (tags ? 'shop'::text))
                                             ->  Bitmap Index Scan on planet_osm_polygon_way_tags_idx  (cost=0.00..42.06 rows=551 width=0)
                                                   Index Cond: ((way && '0103000020110F00000100000005000000769F620324CD5FC1F38C35A62D1A5341769F620324CD5FC129634C157A8054413DA2C93BD5175DC129634C157A8054413DA2C93BD5175DC1F38C35A62D1A5341769F620324CD5FC1F38C35A62D1A5341'::geometry) AND (tags ? 'shop'::text))
                                       ->  Parallel Bitmap Heap Scan on planet_osm_line  (cost=9.54..397.59 rows=57 width=333)
                                             Recheck Cond: ((way && '0103000020110F00000100000005000000769F620324CD5FC1F38C35A62D1A5341769F620324CD5FC129634C157A8054413DA2C93BD5175DC129634C157A8054413DA2C93BD5175DC1F38C35A62D1A5341769F620324CD5FC1F38C35A62D1A5341'::geometry) AND (tags ? 'shop'::text))
                                             ->  Bitmap Index Scan on planet_osm_line_way_tags_idx  (cost=0.00..9.52 rows=97 width=0)
                                                   Index Cond: ((way && '0103000020110F00000100000005000000769F620324CD5FC1F38C35A62D1A5341769F620324CD5FC129634C157A8054413DA2C93BD5175DC129634C157A8054413DA2C93BD5175DC1F38C35A62D1A5341769F620324CD5FC1F38C35A62D1A5341'::geometry) AND (tags ? 'shop'::text))
               ->  Append  (cost=0.68..101.02 rows=3 width=196)
                     ->  Index Scan using planet_osm_point_way_tags_idx on planet_osm_point planet_osm_point_1  (cost=0.68..33.66 rows=1 width=32)
                           Index Cond: ((way && st_expand(planet_osm_point.way, '100'::double precision)) AND (way && '0103000020110F00000100000005000000769F620324CD5FC1F38C35A62D1A5341769F620324CD5FC129634C157A8054413DA2C93BD5175DC129634C157A8054413DA2C93BD5175DC1F38C35A62D1A5341769F620324CD5FC1F38C35A62D1A5341'::geometry) AND (tags @> '"amenity"=>"place_of_worship"'::hstore))
                           Filter: st_dwithin(planet_osm_point.way, way, '100'::double precision)
                     ->  Index Scan using planet_osm_line_way_tags_idx on planet_osm_line planet_osm_line_1  (cost=0.68..33.67 rows=1 width=272)
                           Index Cond: ((way && st_expand(planet_osm_point.way, '100'::double precision)) AND (way && '0103000020110F00000100000005000000769F620324CD5FC1F38C35A62D1A5341769F620324CD5FC129634C157A8054413DA2C93BD5175DC129634C157A8054413DA2C93BD5175DC1F38C35A62D1A5341769F620324CD5FC1F38C35A62D1A5341'::geometry) AND (tags @> '"amenity"=>"place_of_worship"'::hstore))
                           Filter: st_dwithin(planet_osm_point.way, way, '100'::double precision)
                     ->  Index Scan using planet_osm_polygon_way_tags_idx on planet_osm_polygon planet_osm_polygon_1  (cost=0.68..33.68 rows=1 width=223)
                           Index Cond: ((way && st_expand(planet_osm_point.way, '100'::double precision)) AND (way && '0103000020110F00000100000005000000769F620324CD5FC1F38C35A62D1A5341769F620324CD5FC129634C157A8054413DA2C93BD5175DC129634C157A8054413DA2C93BD5175DC1F38C35A62D1A5341769F620324CD5FC1F38C35A62D1A5341'::geometry) AND (tags @> '"amenity"=>"place_of_worship"'::hstore))
                           Filter: st_dwithin(planet_osm_point.way, way, '100'::double precision)
 JIT:
   Functions: 35
   Options: Inlining false, Optimization false, Expressions true, Deforming true
(34 rows)

new:

 Limit  (cost=974297.51..1049612.74 rows=100 width=80)
   ->  Unique  (cost=974297.51..40719655.34 rows=52772 width=80)
         ->  Nested Loop  (cost=974297.51..40718909.38 rows=74596 width=80)
               ->  Gather Merge  (cost=973649.49..979795.65 rows=52772 width=327)
                     Workers Planned: 2
                     ->  Sort  (cost=972649.46..972704.43 rows=21988 width=327)
                           Sort Key: (st_centroid(planet_osm.way)), ((planet_osm.tags -> 'name'::text)), (st_y(st_transform(st_centroid(planet_osm.way), 4326))), (st_x(st_transform(st_centroid(planet_osm.way), 4326)))
                           ->  Parallel Bitmap Heap Scan on planet_osm  (cost=765309.86..971063.64 rows=21988 width=327)
                                 Recheck Cond: ((tags ? 'shop'::text) AND (way && '0103000020110F00000100000005000000769F620324CD5FC1F38C35A62D1A5341769F620324CD5FC129634C157A8054413DA2C93BD5175DC129634C157A8054413DA2C93BD5175DC1F38C35A62D1A5341769F620324CD5FC1F38C35A62D1A5341'::geometry))
                                 ->  BitmapAnd  (cost=765309.86..765309.86 rows=52772 width=0)
                                       ->  Bitmap Index Scan on planet_osm_tags_idx  (cost=0.00..51232.34 rows=5710979 width=0)
                                             Index Cond: (tags ? 'shop'::text)
                                       ->  Bitmap Index Scan on planet_osm_way_idx  (cost=0.00..714050.88 rows=11234444 width=0)
                                             Index Cond: (way && '0103000020110F00000100000005000000769F620324CD5FC1F38C35A62D1A5341769F620324CD5FC129634C157A8054413DA2C93BD5175DC129634C157A8054413DA2C93BD5175DC1F38C35A62D1A5341769F620324CD5FC1F38C35A62D1A5341'::geometry)
               ->  Bitmap Heap Scan on planet_osm planet_osm_1  (cost=648.02..677.04 rows=1 width=254)
                     Recheck Cond: ((way && '0103000020110F00000100000005000000769F620324CD5FC1F38C35A62D1A5341769F620324CD5FC129634C157A8054413DA2C93BD5175DC129634C157A8054413DA2C93BD5175DC1F38C35A62D1A5341769F620324CD5FC1F38C35A62D1A5341'::geometry) AND (tags @> '"amenity"=>"place_of_worship"'::hstore))
                     Filter: st_dwithin(planet_osm.way, way, '100'::double precision)
                     ->  BitmapAnd  (cost=648.02..648.02 rows=1 width=0)
                           ->  Bitmap Index Scan on planet_osm_way_idx  (cost=0.00..75.60 rows=1123 width=0)
                                 Index Cond: ((way && st_expand(planet_osm.way, '100'::double precision)) AND (way && '0103000020110F00000100000005000000769F620324CD5FC1F38C35A62D1A5341769F620324CD5FC129634C157A8054413DA2C93BD5175DC129634C157A8054413DA2C93BD5175DC1F38C35A62D1A5341769F620324CD5FC1F38C35A62D1A5341'::geometry))
                           ->  Bitmap Index Scan on planet_osm_tags_idx  (cost=0.00..572.04 rows=56539 width=0)
                                 Index Cond: (tags @> '"amenity"=>"place_of_worship"'::hstore)
 JIT:
   Functions: 12
   Options: Inlining true, Optimization true, Expressions true, Deforming true
(25 rows)
joto commented 6 months ago

It is difficult to see from that EXPLAIN output alone what is happening here, without having access to the database. Looks like maybe some of the indexes behave differently for some reason or another.

I think the way forward would probably be to first go back to the original database layout with the three tables but use the flex output to generate it. And keep the original VIEW. That would already be a simplification compared to the situation before. Ideally the tables would then look the same and behave the same. Once that works reliably you can introduce more changes. But it is all a question of what kinds of queries you are doing exactly and what you want the database to do at that point.