Icinga / icingaweb2-module-director

The Director aims to be your new favourite Icinga config deployment tool. Director is designed for those who want to automate their configuration deployment and those who want to grant their “point & click” users easy access to the configuration.
https://icinga.com/docs/director/latest
GNU General Public License v2.0
412 stars 203 forks source link

Hang on Preview of Sync rules #2049

Closed ollytom closed 4 years ago

ollytom commented 4 years ago

Expected Behavior

Automation -> Sync rule -> $rule -> Preview

Page of 25 rows appears after a few moments.

Going to the next page takes another few moments.

Current Behavior

Selecting a preview of either the Import source or the Sync rule results in the view hanging for about 3 minutes. After the preview of 25 rows renders, trying to get to the next page (>> button) results in another hang.

Editing the property of a sync rule results in the view hanging for the same amount of time. Changing the Destination Field under "Properties" of a Sync rule causes the same hang.

Possible Solution

Too many things in imported_property and related tables in the director database for the application? In particular maybe lots of non-unique values for property_name in imported_property?

It feels like we may have imported too much irrelevant information in the Director database. But reading Director documentation and issues, it seems the database can get up to gigabytes and many millions of rows without requiring a cleanup (see #1920).

Steps to Reproduce (for bugs)

We're using the netboximport module. It slurps in a lot of detail into Director. ~9000 devices in Netbox.

Your Environment

Row and byte count of relevant (I guess) tables in the Director database:

imported_row: 34455 rows, 2637824 bytes

imported_property: 140844, 43139072

imported_row_property: 1200976, 169443328

imported_rowset: 92, 16384

imported_rowset_row: 254845, 38961152

EC2 instance, 4GB memory 2CPU

davekempe commented 4 years ago

We are about to have about 100K devices in netbox on this site, so director will be unusable if we can't figure out how to get it performing better.

Thomas-Gelf commented 4 years ago

Hi @ollytom, hi @davekempe!

Your numbers don't scare me, judging by them I wouldn't expect performance problems. You could eventually check whether you're importing some very volatile columns. Things that change all the time (like current_time or free_memory) do not belong into an Import Source and would hurt badly over time.

To dig deeper, please go to the CLI and run some benchmarks. You need to know your Import Source IDs, the command icingacli director importsource list helps to figure them out.

Then you could run:

icingacli director importsource check --id 42 --benchmark

...and/or:

icingacli director importsource run --id 42 --benchmark

Same goes for icingacli director syncrule. Please share the results.

Regarding UI performance:

Importing 100.000 devices will works, but 4GB RAM will not be enough. When using MySQL/MariaDB please set at lease innodb_buffer_pool_size to a reasonable value (default settings are usually ridiculous) and make sure that innodb_file_per_table is on BEFORE creating the DB schema (should be default these days). Having a file per table doesn't help with day by day performance, but with some housekeeping strategies.

Eventually run this Import and Sync a couple of times on real hardware and compare it to your cloud instance. Just in case you need some numbers in a blame game ;-)

Cheers, Thomas

ollytom commented 4 years ago

Hey Thomas thanks mate for the detailed reply! This is really helpful :) I’ll have a look tomorrow Aussie time. Mit freundlichen Grüßen Olly

Thomas-Gelf commented 4 years ago

You're welcome! Would love to have your weather here right now ;-) Let me know your findings once you figured out more.

ollytom commented 4 years ago

Not sure if you want the weather from Sydney at the moment with lots of bushfires and smoke; I live right by the beach and I can't even see the water from my apartment :o !

Anyway...

Thanks for your insight on the numbers and many other tips like volatile columns etc.

icingacli director importsource check --id 4 --benchmark:

+--------------+------------------------------------------------+-------------+---------------+-------------+-------------+
| Time         | Description                                    |    Off (ms) |      Dur (ms) |  Mem (diff) | Mem (total) |
+--------------+------------------------------------------------+-------------+---------------+-------------+-------------+
| 22:45:17.247 | Bootstrap, autoloader registered               |       0.001 |         0.001 |  553.18 KiB |  553.18 KiB |
| 22:45:17.256 | Dispatching CLI command                        |       8.848 |         8.847 |  403.88 KiB |  957.05 KiB |
| 22:45:17.281 | Starting with import netbox.example.com hosts |      33.559 |        24.711 |    2.03 MiB |    2.96 MiB |
| 22:46:32.782 | Found changes for netbox.example.com hosts    |   75534.250 |     75500.691 |  251.33 MiB |  254.29 MiB |
| 22:46:32.834 | All done                                       |   75586.611 |        52.361 | -249.11 MiB |    5.18 MiB |
+--------------+------------------------------------------------+-------------+---------------+-------------+-------------+

icingacli director syncrule run --id 7 --benchmark:

+--------------+------------------------------------------------------------------+-------------+---------------+-------------+-------------+
| Time         | Description                                                      |    Off (ms) |      Dur (ms) |  Mem (diff) | Mem (total) |
+--------------+------------------------------------------------------------------+-------------+---------------+-------------+-------------+
| 23:12:24.640 | Bootstrap, autoloader registered                                 |       0.001 |         0.001 |  553.26 KiB |  553.26 KiB |
| 23:12:24.650 | Dispatching CLI command                                          |       9.448 |         9.447 |  403.88 KiB |  957.13 KiB |
| 23:12:24.680 | Checking sync rule Netbox wireless controllers                   |      39.837 |        30.389 |    2.01 MiB |    2.94 MiB |
| 23:12:24.682 | Starting sync                                                    |      41.600 |         1.763 |  165.37 KiB |    3.10 MiB |
| 23:12:24.705 | Begin loading existing objects                                   |      64.645 |        23.045 |    3.35 MiB |    6.46 MiB |
| 23:12:24.915 | Done loading existing objects                                    |     274.336 |       209.691 |   37.21 MiB |   43.66 MiB |
| 23:12:24.915 | Begin loading imported data                                      |     274.343 |         0.007 |    416.00 B |   43.66 MiB |
| 23:12:24.916 | Done pre-processing columns for source netbox.example.com hosts  |     276.178 |         1.835 |   53.15 KiB |   43.72 MiB |
| 23:12:26.612 | Fetched source netbox.example.com hosts                          |    1971.450 |      1695.272 |   14.56 MiB |   58.28 MiB |
| 23:12:26.636 | Done loading imported data                                       |    1995.628 |        24.178 |  -14.31 MiB |   43.97 MiB |
| 23:12:26.637 | Begin preparing updated objects                                  |    1996.930 |         1.302 |   61.01 KiB |   44.03 MiB |
| 23:12:26.637 | Ready to process objects                                         |    1996.944 |         0.014 |    416.00 B |   44.03 MiB |
| 23:12:26.637 | Modified objects are ready, applying purge strategy              |    1996.945 |         0.001 |    416.00 B |   44.03 MiB |
| 23:12:26.647 | Done marking objects for purge                                   |    2006.559 |         9.614 |   11.84 KiB |   44.04 MiB |
| 23:12:26.659 | Done preparing objects                                           |    2018.814 |        12.255 |  516.46 KiB |   44.55 MiB |
| 23:12:26.659 | No modifications for sync rule Netbox wireless controllers       |    2018.899 |         0.085 | -515.55 KiB |   44.04 MiB |
| 23:12:26.664 | All done                                                         |    2023.491 |         4.592 |   -656.00 B |   44.04 MiB |
+--------------+------------------------------------------------------------------+-------------+---------------+-------------+-------------+

innodb_buffer_pool_size was still set to default - oops! But raising it didn't change UI performance. Though of course we're looking to upgrade the box and start tuning stuff when we start getting more data in.

What I didn't expect is changing the Destination Field in Sync Rule properties would hit the import source directly again. And this is the really slow bit. So yes as you advise looks like we should implement listColumns() in the import source module. Maybe caching too? We'll study some other Import Sources to see how it's done.

ollytom commented 4 years ago

Quick update. Reading through the code of the Import Source module, the listColumns() implementation is inefficient - it asks an API for everything, every time, and returns anything it finds. The keys on the other end are very static, so we can be a bit more clever and ask for one item and returning the array of keys in the returned object.

From here on we'll work on the Import Source module and make sure it is being reasonable with the API it is hitting on the other side. Feel free to close this issue.

Thomas you're a legend thanks for your help. Hopefully I can grab you a big German beer sometime in 2020.

Thomas-Gelf commented 4 years ago

Not sure if you want the weather from Sydney at the moment with lots of bushfires and smoke;

Honestly, I didn't know this and learned about it shortly after writing my former comment. Sorry, for that - and no, that's not the situation I'd prefer to have right now.

icingacli director importsource check --id 4 --benchmark:

These 75 seconds are spent on:

To get a distinction of the above, please apply these changes and try again:

--- a/library/Director/Import/Import.php
+++ b/library/Director/Import/Import.php
@@ -3,6 +3,7 @@
 namespace Icinga\Module\Director\Import;

 use Exception;
+use Icinga\Application\Benchmark;
 use Icinga\Exception\IcingaException;
 use Icinga\Module\Director\Db;
 use Icinga\Module\Director\Hook\ImportSourceHook;
@@ -164,7 +165,9 @@ class Import
             $this->data = ImportSourceHook::forImportSource(
                 $this->source
             )->fetchData();
+            Benchmark::measure('Fetched all data from Import Source');
             $this->source->applyModifiers($this->data);
+            Benchmark::measure('Applied Property Modifiers to imported data');
         }

         return $this->data;

Fetching the data will be the most expensive part of this. Still, I'd be curious to see those numbers. This would give me a better understanding of how expensive the check-summing part is in an EC2 instance.

icingacli director syncrule run --id 7 --benchmark:

It then takes Director about two seconds to load all imported data, generate all related objects in memory, load the same objects from it's DB, check for eventual changes - and to persist them to the DB. Not super fast, but if this is for 9000 hosts I would consider it being fast enough.

innodb_buffer_pool_size was still set to default - oops!

Depending on available resources, I would stay with at least 256MB. For your planned 100k Hosts having 4GB wouldn't hurt. Consider monitoring buffer usage.

[..] raising it didn't change UI performance.

Especially for the "Changing the Destination Field" part this was not expected:

What I didn't expect is changing the Destination Field in Sync Rule properties would hit the import source directly again. And this is the really slow bit. So yes as you advise looks like we should implement listColumns() in the import source module. Maybe caching too? We'll study some other Import Sources to see how it's done.

As a quick fix for your environment you could place something like:

return ['name', 'of', 'all', 'your', 'even.nested', 'fields'];

...into the listColumns method of the netboximport module. An Import Source with lots of dynamic properties, and differing properties in every row might still be forced to fetch all of them to get an idea of what is there.

And don't worry, even if you miss some columns in that list: they will be imported anyway. And you'll still be able to use them in the Sync Rule by switching to Custom Expression.

Got your second reply right now:

[...] code of the Import Source module, the listColumns() implementation is inefficient [...] [...] so we can [...] ask for one item and returning the array of keys in the returned object.

If the API provides such a method (and given that all rows have the same keys), this is the way to go. Please use the return ['something', 'hard.coded']; workaround in the meantime in case for any reason this approach doesn't fit your Import Source.

The module could then either add a "Tell me which columns you want to fetch" setting (like ImportSourceLdap does) or implement some caching, like ImportSourceSql. However, I would prefer modules not to do what the SQL Import does. Part of it is based on a Form Element Filter, as this is the only chance the module has to do this right after the user stores the form. And it's not perfect, as it doesn't play nice in case the user commits a configuration error (like an invalid SQL Query).

Let me know in case there is a need, we could try to provide related caching in a generic way.

Thomas you're a legend thanks for your help.

Glad I could help, you're welcome!

Hopefully I can grab you a big German beer sometime in 2020.

I think you have other things to worry about. Stay safe down there!

Cheers, Thomas

NB: I'll close the issue, but I'm still interested in your numbers after adding those two additional benchmark lines