Closed jonescc closed 6 years ago
Another instance of this yesterday for a filtered download
From the geoserver logs last night, the following download timed out for an end user:
http://geoserver-portal.aodn.org.au/geoserver/ows?typeName=faimms_timeseries_data&SERVICE=WFS&outputFormat=csv-with-metadata-header&REQUEST=GetFeature&VERSION=1.0.0&CQL_FILTER=INTERSECTS(geom%2CPOLYGON((143.52676391601%20-26.869415283205%2C143.52676391601%20-12.015899658205%2C155.39199829101%20-12.015899658205%2C155.39199829101%20-26.869415283205%2C143.52676391601%20-26.869415283205)))&downloadToken=1533714674345
Running the equivalent query in postgres shows the following query plan - noting that it took over 22 minutes to return any records:
"Hash Join (cost=25527383.14..27957089.80 rows=21114595 width=273) (actual time=1379617.175..2197878.507 rows=93391133 loops=1)"
" Hash Cond: (measurements_noqaqc.file_id = table_a.id)"
" -> Merge Right Join (cost=25524545.68..26622237.76 rows=21114595 width=90) (actual time=1379564.964..1859505.118 rows=93391133 loops=1)"
" Merge Cond: ((measurements_qaqc."VALUES" = measurements_noqaqc."VALUES") AND (measurements_qaqc."TIME" = measurements_noqaqc."TIME") AND (measurements_qaqc.channel_id = measurements_noqaqc.channel_id))"
" -> Sort (cost=17904121.91..18113836.93 rows=83886008 width=18) (actual time=267126.703..379564.282 rows=83616936 loops=1)"
" Sort Key: measurements_qaqc."VALUES", measurements_qaqc."TIME", measurements_qaqc.channel_id"
" Sort Method: external merge Disk: 2796176kB"
" -> Seq Scan on measurements_qaqc (cost=0.00..1702955.08 rows=83886008 width=18) (actual time=1.235..55059.048 rows=84219077 loops=1)"
" -> Materialize (cost=7620423.77..7725996.74 rows=21114595 width=88) (actual time=1112438.250..1320906.533 rows=93391133 loops=1)"
" -> Sort (cost=7620423.77..7673210.25 rows=21114595 width=88) (actual time=1112438.244..1238604.652 rows=93363282 loops=1)"
" Sort Key: measurements_noqaqc."VALUES", measurements_noqaqc."TIME", measurements_noqaqc.channel_id"
" Sort Method: external merge Disk: 9956016kB"
" -> Hash Join (cost=93.40..2020541.38 rows=21114595 width=88) (actual time=0.783..179170.895 rows=93363282 loops=1)"
" Hash Cond: (measurements_noqaqc.channel_id = table_b.channel_id)"
" -> Seq Scan on measurements_noqaqc (cost=0.00..1571762.84 rows=63343784 width=22) (actual time=0.014..72954.321 rows=93363282 loops=1)"
" -> Hash (cost=92.11..92.11 rows=103 width=70) (actual time=0.745..0.745 rows=309 loops=1)"
" Buckets: 1024 Batches: 1 Memory Usage: 39kB"
" -> Seq Scan on timeseries_feature_deployments_varname table_b (cost=0.00..92.11 rows=103 width=70) (actual time=0.062..0.539 rows=309 loops=1)"
" Filter: ((geom && '0103000020E610000001000000050000003AFFFF3FDBF061401002000092DE3AC03AFFFF3FDBF0614020040000240828C03AFFFF3F8B6C634020040000240828C03AFFFF3F8B6C63401002000092DE3AC03AFFFF3FDBF061401002000092DE3AC0'::ge (...)"
" -> Hash (cost=1584.98..1584.98 rows=35798 width=155) (actual time=51.888..51.888 rows=35682 loops=1)"
" Buckets: 32768 Batches: 2 Memory Usage: 3592kB"
" -> Seq Scan on indexed_file table_a (cost=0.00..1584.98 rows=35798 width=155) (actual time=0.025..24.744 rows=35682 loops=1)"
"Planning time: 1.086 ms"
"Execution time: 2238341.376 ms"
Looks to me like postgres is really, really struggling to join faimms.measurements_qaqc and faimms.measurements_noqaqc for a large number of selected records in real time. Perhaps a materialised view or some other mechanism for generating this combined result set and maintaining it may give better results and reduce the load on the system?
To reproduce
Select faimms and download as csv
What happens
Times out starting download
What I expect to happen
Starts downloading.