brocaar / chirpstack-application-server

ChirpStack Application Server is an open-source LoRaWAN application-server.
https://www.chirpstack.io
MIT License
502 stars 326 forks source link

search node #120

Closed gillespilloudkerlink closed 6 years ago

gillespilloudkerlink commented 7 years ago

Hi !

I've loaded with the apis up to 400 devices... and I lost lot of time to search device in the device list :

Will it be possible to imagine have search capabilities, or at least the possibility to change the number of devices displayed at each page, to improve the user experience ?

image

brocaar commented 7 years ago

Sounds like a good feature :-) I guess being able to search on name and / or DevEUI (prefix?) would make sense in this case.

gillespilloudkerlink commented 7 years ago

cool !

iegomez commented 7 years ago

@brocaar Are you working on this or do you need any help? I'm implementing it on my fork, so I wouldn't mind adding it to yours and creating a PR when ready.

iegomez commented 7 years ago

@brocaar @gillespilloudwyres I forked from the latest version and added the node search functionality by either name or DevEUI, and matching any substring given, not the whole field. You can check it at https://github.com/iegomez/lora-app-server. If you think it's a good addition, I'll make a pull request. Cheers!

brocaar commented 7 years ago

@iegomez Cheers! I was out of the country for a conference. Will take a look at it this or next week :-)

iegomez commented 7 years ago

@brocaar I saw your comment on the pagination issue: "Regarding the search solution I need to do some more research around performance (e.g. what changes this would require on the DevEUI column in order to avoid table scans)."

I looked up some ways to deal with LIKE or ILIKE queries avoiding the full scan, and it seems the most common solution is using the pg_trgm extension to create a gin index on the search column, as the query will take the index in consideration (from Potgres 9.1 and above). You can find a great summary at https://niallburkley.com/blog/index-columns-for-like-in-postgres/ More detailed information here: http://0x80.pl/articles/sql-ngram-index.html

That said, I'm not really sure if this would work when applying text() to the dev_eui byte array, I guess we'd have to analyze the query plan matching only dev_eui and not name, with and without the extension, to check if there's any difference.

iegomez commented 7 years ago

I tried defining a gin index on text(dev_eui), but it made no difference on the select plan, in both cases (with and without index) a sequencial scan was made. Any ideas?

siscia commented 7 years ago

How big is your table?

A sequential scan will still be faster up to a lot of row.

iegomez commented 7 years ago

I was just testing with ~2000 to see if there was any difference, as Orne was concerned about scans. With that amount the search is fast, about 12 ms, but the concern still stands. I'm no DB expert, so I'm not sure if there's a way to deal with this.

siscia commented 7 years ago

12 ms doesn't means much :)

You need to consider in what machine are you running, SSD or HD, did your query actually hit the disk, what was the load of the machine at the moment and so on...

There is much to explore in this problem but most will come down to a simple trade-off that includes how many node is necessary to support.

A quick example.

In my database the dimension of a row in the node table is of roughly 300 bytes (some arrive to 900bytes, but are less than 300). A page in postgres is of 8kB so on a single page will fit roughly 25 nodes.

It also means that a sequential scan on 2500 nodes will need to fetch (very worst case scenario) 100 pages (most will be in cache/memory already). How long does it take to fetch a page depends on your hardware, cache-friendlyness of the layout and so on and so forth.

On the other side with an index what will be slower will be the write time, this is extremely messy to compute, but image a simple binary tree (GIN index are actually B+ tree) and image to put a new node in the tree, every time that you hit an intermediate node you do a page access (an uncached one). However when you do a search you don't hit anymore every single node in the table but only the one in tha path between the root and your target.

At the end of the story it depends if we prefer fast insertion (no index) or fast look up (index), however, as long as the table is small it won't make any difference.

What means "small", again depends on your machine, your disks, your load and so on and so forth.

In your query you will need two GIN index, one for devEUI and one for name.

:)

iegomez commented 7 years ago

I'm aware how btrees work and the complexity of select/insertion (didn't know about those page sizes though, nice info), and yes, I think fast look up is what @gillespilloudwyres an @brocaar aim for. What I don't know is if Postgres has a way of using an index over text(dev_eui) for a select with ILIKE and wildcards (my test shows that at least the gin index approach doesn't change the sequencial plan), or if there's another option (does ILIKE or something similar allow to search against bytea instead of text with wildcards or something like that? would an index work in such option?) or a better approach (is it a valid option to create another column that holds a text value for the dev_eui -which should benefit from the gin index, as name does- only for search purposes?.

I think these are the questions that need answer for the search feature to be correctly implemented, specially if it would make it to the default release, because you can't assume the number of nodes any given user will have. In my case the sequencial look up is good enough, but that may not be the case for someone else.

siscia commented 7 years ago

Great :)

Have you try to use an EXPLAIN.

It will tell you whenever the index would be useful. A naive EXPLAIN will clearly show the sequential scan, if you disable the scan it should help understamd if the index is actually useful.

:)

iegomez commented 7 years ago

Sorry for the late response, I've been quite busy lately. I used explain analyze in the test I mentioned early. Now I followed your suggestion and disabled scans, but it doesn't seem to make a significant difference (nor does it mention using the gin index, only the node id index). Of course, ~10000 records aren't much, so maybe the planner just omits the index because a sequential scan is good enough. I'll need to test further.

Thanks for the help!

siscia commented 7 years ago

Hi,

that is weird, 10k rows should be enough to start using indexes.

There are two main thing that I would like to be sure, assuming that the query you are running is the one in your commit history: https://github.com/brocaar/lora-app-server/commit/d149b2241033841e43790ff2a83d325463af5ba3#diff-1b3165fff3fcd132a3b217e13372ee83R342

None of the indexes are used? Neither the one on the column "name"?

How did you create the index on the column "dev_eui"? Did you create it on "dev_eui" or on "text(dev_eui)"?

Speaking about "text", why did you use it? Isn't "encode/decode" more appropriate? Couldn't find much documentation on "text"

If you could share the database I could try to work on it too... Or if you have some kind of script that quickly populate it...

iegomez commented 7 years ago

The query I was testing doesn't consider the name part, as a varchar field is not the issue. So there's no index to be used for the name column in the test, and the only index that got used is the application_id one when disabling seq scans. Here's an example run:

explain analyse select * from node where application_id = 1 and text(dev_eui) ilike '%02%' order by name;
                                                                  QUERY PLAN                                                                  
----------------------------------------------------------------------------------------------------------------------------------------------
 Sort  (cost=703.99..708.99 rows=2000 width=137) (actual time=15.282..15.325 rows=567 loops=1)
   Sort Key: name
   Sort Method: quicksort  Memory: 175kB
   ->  Index Scan using idx_node_application_id on node  (cost=0.29..594.33 rows=2000 width=137) (actual time=0.124..14.104 rows=567 loops=1)
         Index Cond: (application_id = 1)
         Filter: ((dev_eui)::text ~~* '%02%'::text)
         Rows Removed by Filter: 9435
 Planning time: 0.578 ms
 Execution time: 15.441 ms
(9 rows)

The index was created on text(dev_eui), which I used because the dev_eui field is a bytea one, so I couldn't use a gin index over it (or use ilike fot that matter). Why text instead of encode/decode? It was just my first idea, I'll try using the other option.

For the records I just threw a simple func that created nodes with random DevEUIs and some default options and called it in main. Anyway, if you'd like to have the same specific records, I'll dump the nodes table and send it later.

iegomez commented 7 years ago

I created an index on encode(dev_eui, 'hex'):

create index trgm_encode_idx_node_dev_eui on node using gin (encode(dev_eui, 'hex') gin_trgm_ops);

The result was pretty much the same:

explain analyse select * from node where application_id = 1 and encode(dev_eui, 'hex') ilike '%02%' order by name;
                                                                  QUERY PLAN                                                                  
----------------------------------------------------------------------------------------------------------------------------------------------
 Sort  (cost=678.99..683.99 rows=2000 width=137) (actual time=17.133..17.176 rows=567 loops=1)
   Sort Key: name
   Sort Method: quicksort  Memory: 175kB
   ->  Index Scan using idx_node_application_id on node  (cost=0.29..569.33 rows=2000 width=137) (actual time=0.214..15.910 rows=567 loops=1)
         Index Cond: (application_id = 1)
         Filter: (encode(dev_eui, 'hex'::text) ~~* '%02%'::text)
         Rows Removed by Filter: 9435
 Planning time: 0.682 ms
 Execution time: 17.333 ms
(9 rows)

To compare, I created a gin index on name and run a similar query but on name, with a pretty different result:

explain analyse select * from node where application_id = 1 and name ilike '%ode%';
                                                            QUERY PLAN                                                             
-----------------------------------------------------------------------------------------------------------------------------------
 Bitmap Heap Scan on node  (cost=101.51..444.52 rows=10001 width=137) (actual time=4.037..14.684 rows=10001 loops=1)
   Recheck Cond: ((name)::text ~~* '%ode%'::text)
   Filter: (application_id = 1)
   Heap Blocks: exact=193
   ->  Bitmap Index Scan on idx_gin_node_name  (cost=0.00..99.01 rows=10001 width=0) (actual time=3.993..3.993 rows=10001 loops=1)
         Index Cond: ((name)::text ~~* '%ode%'::text)
 Planning time: 0.793 ms
 Execution time: 15.226 ms
(8 rows)

Any insights on this?

siscia commented 7 years ago

@iegomez , you are in the back of my mind, I am really trying to figure out why it doesn't use the index.

If you could dump the table it would be amazing!

iegomez commented 7 years ago

@siscia I'll send you the dump on wednesday, we are on national holidays right now.

iegomez commented 7 years ago

@siscia You can download the table dump from https://expirebox.com/download/98168bdba97c57e91f5d6abae90df8bf.html. It will expire in 48 hours.

siscia commented 7 years ago

@iegomez Get it :)

I don't make any promises!

siscia commented 7 years ago

@iegomez I believe that the problem is quite "trivial"...

pg_tgrm is based on tri-grams, your query use a bi-gram and cannot use the index (seems a quite educate guess reading online, but honestly I am not so expert)

What I would suggest is to try the exact same query using a bigger (more than 2, so at least 3) char in the query.

Let's compare a query that use ilike with two chars:

postgres=# explain analyse
postgres-# select * from node
postgres-# where application_id = 1 and
postgres-# encode(dev_eui, 'hex') ilike '%10%' order by name;
                                                  QUERY PLAN                                                  
--------------------------------------------------------------------------------------------------------------
 Sort  (cost=477.69..482.69 rows=2000 width=161) (actual time=61.312..62.040 rows=543 loops=1)
   Sort Key: name
   Sort Method: quicksort  Memory: 169kB
   ->  Seq Scan on node  (cost=0.00..368.04 rows=2000 width=161) (actual time=0.137..57.276 rows=543 loops=1)
         Filter: ((application_id = 1) AND (encode(dev_eui, 'hex'::text) ~~* '%10%'::text))
         Rows Removed by Filter: 9459
 Planning time: 0.324 ms
 Execution time: 62.572 ms
(8 rows)

Time: 63,734 ms

And a similar query that use 3 char with ilike

postgres=# explain analyse
postgres-# select * from node
postgres-# where application_id = 1 and
postgres-# encode(dev_eui, 'hex') ilike '%103%' order by name;
                                                                  QUERY PLAN                                                                   
-----------------------------------------------------------------------------------------------------------------------------------------------
 Sort  (cost=232.39..233.39 rows=400 width=161) (actual time=0.757..0.799 rows=43 loops=1)
   Sort Key: name
   Sort Method: quicksort  Memory: 36kB
   ->  Bitmap Heap Scan on node  (cost=15.10..215.10 rows=400 width=161) (actual time=0.084..0.496 rows=43 loops=1)
         Recheck Cond: (encode(dev_eui, 'hex'::text) ~~* '%103%'::text)
         Filter: (application_id = 1)
         Heap Blocks: exact=39
         ->  Bitmap Index Scan on trgm_encode_idx_node_dev_eui  (cost=0.00..15.00 rows=400 width=0) (actual time=0.045..0.045 rows=43 loops=1)
               Index Cond: (encode(dev_eui, 'hex'::text) ~~* '%103%'::text)
 Planning time: 0.273 ms
 Execution time: 0.967 ms
(11 rows)

You can see that with at least 3 chars the query use the index.

If it is possible, and reasonable, it could be worth to make a db call when there are at least 3 chars in query, or after that the user had waited a little bit. (It is a realtime search or press a button and then make the query?)

iegomez commented 7 years ago

Damn I feel dumb, I always assumed that the extension would just manage a shorter gram in some way, what a silly mistake. You're totally right, @siscia, I tested it and it works fine, either on name or on encode(dev_eui, 'hex') (with text(dev_eui) it works fine too).

To address the case in question, I did some more tests using a multi-column index on name and dev_eui using gin: create index multi_idx_node_name on node using gin (name gin_trgm_ops, encode(dev_eui, 'hex') gin_trgm_ops);

When running a query against dev_eui, it uses it:

explain analyse select * from node where application_id = 1 and encode(dev_eui, 'hex') ilike '%0202%';
                                                          QUERY PLAN                                                          
------------------------------------------------------------------------------------------------------------------------------
 Bitmap Heap Scan on node  (cost=20.62..171.59 rows=80 width=137) (actual time=0.116..0.157 rows=2 loops=1)
   Recheck Cond: (encode(dev_eui, 'hex'::text) ~~* '%0202%'::text)
   Rows Removed by Index Recheck: 3
   Filter: (application_id = 1)
   Heap Blocks: exact=5
   ->  Bitmap Index Scan on multi_idx_node_name  (cost=0.00..20.60 rows=80 width=0) (actual time=0.071..0.071 rows=5 loops=1)
         Index Cond: (encode(dev_eui, 'hex'::text) ~~* '%0202%'::text)
 Planning time: 0.573 ms
 Execution time: 0.257 ms

When running a query against name only it also uses the index:

explain analyse select * from node where application_id = 1 and name ilike '%ode%';
                                                             QUERY PLAN                                                              
-------------------------------------------------------------------------------------------------------------------------------------
 Bitmap Heap Scan on node  (cost=101.51..444.52 rows=10001 width=137) (actual time=5.928..37.951 rows=10001 loops=1)
   Recheck Cond: ((name)::text ~~* '%ode%'::text)
   Filter: (application_id = 1)
   Heap Blocks: exact=193
   ->  Bitmap Index Scan on multi_idx_node_name  (cost=0.00..99.01 rows=10001 width=0) (actual time=5.704..5.704 rows=10001 loops=1)
         Index Cond: ((name)::text ~~* '%ode%'::text)
 Planning time: 0.885 ms
 Execution time: 39.491 ms

This seems odd, as I thought that a multi-column index on (A, B) should work for both together and A independently, but not for B independently, rendering a unique index on A useless and one on B necessary. That doesn't appear to be the case here. Maybe the latest versions of Postgres have changed this behaviour, or maybe I was wrong all along. Do you know about this?

Finally, when querying with an OR condition, with seq scan enabled and disabled, the results were the following:

appserver=> SET enable_seqscan = OFF;
appserver=> explain analyse select * from node where application_id = 1 and (encode(dev_eui, 'hex') ilike '%0202%' or name ilike '%ode%');
                                                                QUERY PLAN                                                                 
-------------------------------------------------------------------------------------------------------------------------------------------
 Bitmap Heap Scan on node  (cost=124.61..517.65 rows=10001 width=137) (actual time=3.571..27.501 rows=10002 loops=1)
   Recheck Cond: ((encode(dev_eui, 'hex'::text) ~~* '%0202%'::text) OR ((name)::text ~~* '%ode%'::text))
   Filter: (application_id = 1)
   Heap Blocks: exact=193
   ->  BitmapOr  (cost=124.61..124.61 rows=10002 width=0) (actual time=3.521..3.521 rows=0 loops=1)
         ->  Bitmap Index Scan on multi_idx_node_name  (cost=0.00..20.60 rows=80 width=0) (actual time=0.095..0.095 rows=5 loops=1)
               Index Cond: (encode(dev_eui, 'hex'::text) ~~* '%0202%'::text)
         ->  Bitmap Index Scan on multi_idx_node_name  (cost=0.00..99.01 rows=10001 width=0) (actual time=3.424..3.424 rows=10001 loops=1)
               Index Cond: ((name)::text ~~* '%ode%'::text)
 Planning time: 1.155 ms
 Execution time: 28.064 ms
(11 rows)
appserver=> SET enable_seqscan = ON;
appserver=> explain analyse select * from node where application_id = 1 and (encode(dev_eui, 'hex') ilike '%0202%' or name ilike '%ode%');
                                                          QUERY PLAN                                                          
------------------------------------------------------------------------------------------------------------------------------
 Seq Scan on node  (cost=0.00..393.04 rows=10001 width=137) (actual time=0.057..20.715 rows=10002 loops=1)
   Filter: ((application_id = 1) AND ((encode(dev_eui, 'hex'::text) ~~* '%0202%'::text) OR ((name)::text ~~* '%ode%'::text)))
 Planning time: 0.784 ms
 Execution time: 21.272 ms
(4 rows)

The query will not use the multi-column index when seq scan is on, but I found that to be related with the specific query string 'ode'. My test records all have 'node' in their name, so the index won't be used, but if I change it to match a small group of nodes (in particular, I have one called 'gw-simulation-node'), it uses the index whether enable_seqscan is on or off and works great. The latter applies to name only queries too:

appserver=> explain analyse select * from node where application_id = 1 and (encode(dev_eui, 'hex') ilike '%0202%' or name ilike '%sim%');
                                                             QUERY PLAN                                                             
------------------------------------------------------------------------------------------------------------------------------------
 Bitmap Heap Scan on node  (cost=32.65..183.84 rows=81 width=137) (actual time=0.132..0.191 rows=2 loops=1)
   Recheck Cond: ((encode(dev_eui, 'hex'::text) ~~* '%0202%'::text) OR ((name)::text ~~* '%sim%'::text))
   Rows Removed by Index Recheck: 3
   Filter: (application_id = 1)
   Heap Blocks: exact=5
   ->  BitmapOr  (cost=32.65..32.65 rows=81 width=0) (actual time=0.087..0.087 rows=0 loops=1)
         ->  Bitmap Index Scan on multi_idx_node_name  (cost=0.00..20.60 rows=80 width=0) (actual time=0.070..0.070 rows=5 loops=1)
               Index Cond: (encode(dev_eui, 'hex'::text) ~~* '%0202%'::text)
         ->  Bitmap Index Scan on multi_idx_node_name  (cost=0.00..12.01 rows=1 width=0) (actual time=0.015..0.015 rows=1 loops=1)
               Index Cond: ((name)::text ~~* '%sim%'::text)
 Planning time: 0.938 ms
 Execution time: 0.303 ms
(12 rows)

So, to sum up, I think a multi-column index on name and encode(dev_eui, 'hex') using the pg_trgm gin is the way to go. I also agree to search only with 3 or more characters by default. Maybe doing auto search with >= 3 chars and explicit search (i.e., pressing a search button) for 1 or 2 chars is a good mix, though removing chars under the 3 threshold may be weird (should it do an immediate search? should it wait to see if the user was erasing the whole search input?). Anyway, @gillespilloudwyres, @brocaar, @siscia, what do you guys think?

siscia commented 7 years ago

@iegomez , don't worry, I felt very dumb too when I couldn't understand why with only 2 chars it didn't use the index :smile:

GIN stands for Generalized Inverted Index, when it is decided what we want to index that part is written along with the column where it appears. Suppose you want to use trigram GIN index on the world lorawan the first step is to create the trigram that are: __l, _lo, lor, ora, raw awa, wan, an_, n__.

At this point at each of these trigrams is associate the tuple the contains the world lorawan.

(Note how GIN and trigram are actually different piece of software, the trigram indexes uses the GIN abstraction)

In this scenario the composite index is even difficult to define, it is not the classical direct B+-Tree index and I could believe that the behaviour we are seeing is justified.

I would try to keep a consistent behaviour everywhere, and I would use a simple "Search" button to avoid query the DB when not really necessary.

iegomez commented 7 years ago

Hi, sorry I haven't commented anymore, I was a bit busy. If I get any time in the next few days, I'll change the auto search behaviour to a explicit search button and then generate a PR. Note that in order for search to work the extension pg_trgm must be created as a pre-step so that the indexes may be created, so the PR would include this on the DB creation part of the docs:

-- connect to db
 \c loraserver_as

 -- create the pg_trgm extension for search capabilities
 create extension pg_trgm;

Sadly, anyone upgrading from a past version could miss this, so I was thinking that maybe there should be a default fallback in the index migrations if the extension isn't found. On the other hand, normally lora-app-server would just crash and log a migration failure involving the index creation, so that could be enough warning to get users to create the extension. Any thoughts?

siscia commented 7 years ago

@iegomez , sorry I got now the notification and I remembered of the issues about the extension.

Are you sure that it need to be create together with the database and it cannot be a simple new migration?

iegomez commented 7 years ago

@siscia You need superuser privileges to create the extension, so unless you grant your appserver user with superuser privileges, it needs to be created earlier.

brocaar commented 7 years ago

Thanks to @iegomez we almost have the search in place :-) 👍

brocaar commented 7 years ago

This feature has been merged into master and will be included into the next release.

brocaar commented 6 years ago

Thanks again @siscia and @iegomez. I might actually consider using the pg_trgm now. I'm currently working on a global search page (e.g. type in a deveui), to search trough all accessible organizations, applications, devices and gateways.

The advantage of the pg_trgm extensions is that it also provides functions to calculate the similarity between two arguments (https://www.postgresql.org/docs/9.6/static/pgtrgm.html), which is useful to order the search results.

screen shot 2018-03-28 at 12 15 26

What do you think?

iegomez commented 6 years ago

I like it, and pg_trgm is really easy to use for that kind of ordering. Have you thought of the best way to introduce this dependency (because of the superuser restriction)?

brocaar commented 6 years ago

I think there is no other option than to document it, next to the create database commands + add a warning in the changelog that the extension needs to be enabled before upgrading.

Unfortunately it is not possible to do it using the migrations as they will fail when the user does not have the needed permissions. Even create extension if not exists pg_trgm fails when already enabled (else it would be possible to include this in the migrations).

gillespilloudkerlink commented 6 years ago

thanks a lot, the deveui search is very helpful and now I can find easily all of my devices !