Bad performance with many hosts in one subnet

indy-independence commented 8 years ago

The performance of NIPAP is bad when having many hosts (~1100 in my case) in one subnet. The issue appears when you search for the subnet, and expand the list of hosts. Towards the end it's very slow to list the hosts. I've done a bit of troubleshooting and there seems to be two separate problems that I can find.

PREFIX_BATCH_SIZE optimization not working as intended
An expensive sort-operation in the actual SQL query when searching

PREFIX_BATCH_SIZE PREFIX_BATCH_SIZE is set to 50 in the GUI, as far as I understand this is some kind of optimization to not query all the entries at once. When having one subnet with 1000 hosts, this results in something like 22 SQL queries, which might not be an issue. However, each additional query is slower than the previous one. By the end of it each query is around 8 seconds, and the time to complete all 22 queries is something like 2 minutes. It seems like if this optimization were to work, each additional query would have to take roughly the same time as the previous instead of always being slower and slower.

Search Query When analyzing the actual search query there seems to be some issues: https://explain.depesz.com/s/AXwp For example this one: Sort (cost=192.40..192.41 rows=7 width=289) (actual time=139.898..144.258 rows=25,138 loops=1) Sort Key: (vrf_rt_order(vrf.rt)), p1.prefix, (CASE WHEN ((p1.prefix)::inet = (p2.prefix)::inet) THEN 0 ELSE 1 END) Sort Method: quicksort Memory: 10116kB

I don't know much about Postgres but that's a very long time, 25000 rows and 10MB of data to sort I think? And this is with the query limit of 50 results. Maybe this can be optimized somehow?

indy-independence commented 8 years ago

This is one especially bad case of the search query: https://explain.depesz.com/s/gQos

To return a list of 49 entries, Postgres has to sort through over 1.2 million entries using 300MB of space.

indy-independence commented 7 years ago

The query runs much faster when I remove this part:

OR
-- Join in all neighbors (p1) of matching prefixes (p2)
(true = """ + include_neighbors + """ AND iprange(p1.prefix) << iprange(p2.display_prefix::cidr) AND p1.indent = p2.indent)

The sort only has to go through 2000 rows instead of 1.2million+ in my case. New explain analyze with "neighbors" removed: https://explain.depesz.com/s/B9xd Maybe it should be optional to search with neighbors since it seems to slow everything down quite a lot? My test went from taking 30s to 1.8s with this change.

plajjan commented 7 years ago

I think there are other ways of speeding up the query without loosing functionality. "neighbors" is quite important to give context to searches. Back-in-the-day we didn't have it and users were confused about the result they got ;)

eoprede commented 7 years ago

I'm running into similar issue. I have multiple VRFs, on average about 400 addresses per VRF. Total of 30 VRFs. It seems that if I just scroll down and try to expand subnet in on of the lower VRFs, request takes over 40 seconds (and keeps getting longer as I add VRFs). The subnets in VRFs on the top of the screen do not seem to be affected, taking less than a second to expand. If I filter by VRF, then the same expansion takes about 1 second. Also searching for VRF name seems to fix the issue.

SpriteLink / NIPAP

Bad performance with many hosts in one subnet #1077