GIS4DEV / GIS4DEV.github.io

Open Source GIScience & GIS for Development
1 stars 12 forks source link

SQL long run time #7

Closed chriskgernon closed 4 years ago

chriskgernon commented 4 years ago

I am trying to run this code:

/ finding distance from drains to waste sites/ create table distfromdrain as select drains37s.id as drains, waste37s.id as waste, st_distance(geography(st_transform(drains37s.geom, 4326)), geography(st_transform(waste37s.geom, 4326))) as dist from drains37s, waste37s where st_intersects(st_buffer(drains37s.geom,50), waste37s.geom)

I let it run for over 7 hours yesterday, and it did not finish. Do you know what I am doing wrong?

Here is the rest of the code for context:

`/fixing geometries and reprojecting/ create table wards37s as select id, fid, ward_name, st_makevalid(st_transform(geom,32737)) as geom from wards

create table subwards37s as select fid, st_makevalid(st_transform(geom,32737)) as geom from subwards

create table waste37s as select id, st_transform(geom, 32737) as geom from waste

create table drains37s as select id, st_transform(geom,32737) as geom from drains

select populate_geometry_columns()

/assigning drains and waste sites to wards and subwards/ alter table drains37s add column ward text

update drains37s set ward=wards37s.ward_name from wards37s where st_intersects(drains37s.geom,wards37s.geom)

alter table drains37s add column subward float8

update drains37s set subward=subwards37s.fid from subwards37s where st_intersects(drains37s.geom,subwards37s.geom)

alter table waste37s add column subward float8

update waste37s set subward=subwards37s.fid from subwards37s where st_intersects(waste37s.geom,subwards37s.geom)

/ finding distance from drains to waste sites/ create table distfromdrain as select drains37s.id as drains, waste37s.id as waste, st_distance(geography(st_transform(drains37s.geom, 4326)), geography(st_transform(waste37s.geom, 4326))) as dist from drains37s, waste37s where st_intersects(st_buffer(drains37s.geom,50), waste37s.geom)

/finding minimum distance between drains and waste sites / create table mindistfromdrain as select drains, min(dist) as dist from distfromdrain group by drains

/averaging minimum distance for wards and subwards/ create table wardminavg as select b.ward, avg(dist) from mindistfromdrain as a left outer join drains37s as b on a.drains=b.id group by b.ward

create table subwardminavg as select b.subward, avg(dist) from mindistfromdrain as a left outer join drains37s as b on a.drains=b.id group by b.subward

/adding columns/ alter table subwards37s add column wastesites float8, add column drains float8, add column avgmindist float8

/counting number of drains and waste sites in each subward/ update subwards37s set wastesites=( select count(waste37s.id) from waste37s where subwards37s.fid=waste37s.subward)

update subwards37s set drains=( select count(drains37s.id) from drains37s where subwards37s.fid=drains37s.subward)

/adding average minimum distance to subwards/ update subwards37s set avgmindist=subwardminavg.avg from subwardminavg where subwards37s.fid=subwardminavg.subward

/adding columns/ alter table wards37s add column wastesites float8, add column drains float8, add column avgmindist float8

/counting number of drains and waste sites in each ward/ update wards37s set wastesites=( select count(waste37s.id) from waste37s where wards37s.ward_name=waste37s.ward)

update wards37s set drains=( select count(drains37s.id) from drains37s where wards37s.ward_name=drains37s.ward)

/adding average minimum distance to wards/ update wards37s set avgmindist=wardminavg.avg from wardminavg where wards37s.ward_name=wardminavg.ward`

Thanks for the help.

josephholler commented 4 years ago

This is a tough algorithm to solve, because it's essentially creating a distance matrix including every possible distance combination between every waste site and every drain. For small numbers of features, it'd be ok, but for large datasets, it's a huge task.

It seems like you're asking a question about the distance from features in one layer to the nearest feature in another layer. There's a detailed discussion of the challenges in this (and how to optimize a faster solution) here: https://postgis.net/workshops/postgis-intro/knn.html

I have gotten some nearest neighbor queries to work... so if you're determined to go this route I can provide some example code. Otherwise, can the question be framed as finding features within a set distance, found with either st_buffer or ST_DWithin