3liz / qgis-pgmetadata-plugin

QGIS Plugin to manage some metadata from PostgreSQL layer
GNU General Public License v2.0
12 stars 10 forks source link

Enhancement suggestion: search more fields with locator #82

Closed effjot closed 2 years ago

effjot commented 3 years ago

I have extended the locator search to keywords, categories, and themes (labels expanded according to locale). Maybe you’d like to add this to the main branch.

Gustry commented 3 years ago

Ok for me.

@mdouchin What do you think about the cost of this view ? It should be alright.

effjot commented 2 years ago

If the query would be too expensive, could including these fields be set as a configuration option?

Gustry commented 2 years ago

Maybe yes.

The locator makes the query for every characters written in the search text. So it's a lot of queries. Building the flat table for each requests might not be the best if you have a few simultaneous users and more "dataset" rows (with contacts and links).

Are you using it in production ? Can you monitor the execution time of this query on your instance ? How many dataset features do you have in your table ?

CC @mdouchin @drillinP

mdouchin commented 2 years ago

The function export_datasets_as_flat_table uses the view v_export_table, which is not very heavy IMHO. Since the dataset table will be of reasonable size for the average database, we can try it, and we could see how it reacts IRL.

Please answer @Gustry questions above about the size of the dataset table and if you can run a EXPLAIN ANALYSE on the query ?

effjot commented 2 years ago

Thanks for your input! I didn’t quite realise that a search is done for every character typed.

I don’t have a production environment yet, but am preparing our switch to QGIS + PostgreSQL (we’re a team of about 15 gis users). At the moment, our test database isn’t very big. I can try to fill the DB with more data and do several parallel searches to get a feeling for how it reacts.

At the moment, I’m testing with two DBs (around 20 and around 5 datasets; I’m at home now, so I can’t tell the exact number). Tomorrow I’ll run the explain analyse.

Gustry commented 2 years ago

Thanks for your input! I didn’t quite realise that a search is done for every character typed.

Yes, the locator makes the SQL query to the database to fetch results.

It's possible to make a time delay in the locator with QGIS 3.18 : https://qgis.org/api/classQgsLocatorFilter.html#a5f17ae8096cab7a6a4e134d9edf46074

effjot commented 2 years ago

I now have 50 rows in the dataset table. This is the explain analyse output:

geodaten=> explain analyse SELECT concat(d.title, ' (', d.table_name, '.', d.schema_name, ')') AS displayString, d.schema_name, d.table_name, d.geometry_type, title FROM pgmetadata.dataset d INNER JOIN pgmetadata.v_valid_dataset v ON concat(v.table_name, '.', v.schema_name) = concat(d.table_name, '.', d.schema_name) WHERE concat(d.title, ' ', d.abstract, ' ', d.table_name, ' ',  d.categories, ' ', d.keywords, ' ', d.themes) ILIKE '%Nitrat%' LIMIT 100;
                                                                  QUERY PLAN
----------------------------------------------------------------------------------------------------------------------------------------------
 Limit  (cost=64.03..78.87 rows=1 width=125) (actual time=0.737..2.885 rows=5 loops=1)
   ->  Nested Loop  (cost=64.03..78.87 rows=1 width=125) (actual time=0.737..2.883 rows=5 loops=1)
         Join Filter: (concat(d.table_name, '.', d.schema_name) = concat(d_1.table_name, '.', d_1.schema_name))
         Rows Removed by Join Filter: 245
         ->  Seq Scan on dataset d  (cost=0.00..14.78 rows=1 width=93) (actual time=0.134..2.102 rows=5 loops=1)
               Filter: (concat(title, ' ', abstract, ' ', table_name, ' ', categories, ' ', keywords, ' ', themes) ~~* '%Nitrat%'::text)
               Rows Removed by Filter: 45
         ->  Sort  (cost=64.03..64.03 rows=2 width=37) (actual time=0.118..0.120 rows=50 loops=5)
               Sort Key: d_1.schema_name, d_1.table_name
               Sort Method: quicksort  Memory: 29kB
               ->  Hash Join  (cost=16.39..64.02 rows=2 width=37) (actual time=0.167..0.451 rows=50 loops=1)
                     Hash Cond: (((n.nspname)::text = (d_1.schema_name)::text) AND ((c.relname)::text = (d_1.table_name)::text))
                     ->  Hash Join  (cost=1.09..48.41 rows=56 width=128) (actual time=0.032..0.317 rows=181 loops=1)
                           Hash Cond: (c.relnamespace = n.oid)
                           ->  Seq Scan on pg_class c  (cost=0.00..46.69 rows=181 width=72) (actual time=0.011..0.258 rows=181 loops=1)
                                 Filter: ((relname IS NOT NULL) AND (relkind = ANY ('{r,p}'::"char"[])))
                                 Rows Removed by Filter: 834
                           ->  Hash  (cost=1.04..1.04 rows=4 width=68) (actual time=0.013..0.013 rows=14 loops=1)
                                 Buckets: 1024  Batches: 1  Memory Usage: 10kB
                                 ->  Seq Scan on pg_namespace n  (cost=0.00..1.04 rows=4 width=68) (actual time=0.005..0.008 rows=14 loops=1)
                     ->  Hash  (cost=14.52..14.52 rows=52 width=37) (actual time=0.055..0.056 rows=50 loops=1)
                           Buckets: 1024  Batches: 1  Memory Usage: 12kB
                           ->  Seq Scan on dataset d_1  (cost=0.00..14.52 rows=52 width=37) (actual time=0.005..0.025 rows=50 loops=1)
 Planning Time: 0.535 ms
 Execution Time: 2.969 ms
(25 rows)

(the planning times are in the range 0.5 – 0.6 ms, execution times 2.8 – 3.1 ms)

In my test project I connect to 2 databases with metadata, but the second DB dataset table has only a few rows. Searching with the locator bar is fast, I experienced no delay. At the moment, the server isn’t under heavy load though.

I’m working with QGIS 3.16, but will switch to 3.22 (next LTR) in October.

Gustry commented 2 years ago

Thanks, let's see how it goes.