aws / aws-cli

Universal Command Line Interface for Amazon Web Services
Other
15.34k stars 4.08k forks source link

Autocompletion index generation could be much faster #8889

Open nairb774 opened 2 weeks ago

nairb774 commented 2 weeks ago

Describe the feature

Currently when generating the autocomplete index, the underlying code populates a sqlite3 database with the needed completion information. The underlying generation code doesn't perform explicit transactions, and as a result each DML statement issued is run through the full transaction cycle.

This can be vastly improved by increasing the size of the transactions used when populating the database. As a proof of concept, I added two explicit transaction blocks in awscli/autocomplete/local/indexer.py and awscli/autocomplete/serverside/indexer.py on top of 2.17.40. Running gen-ac-index I get the following:

$ time ./scripts/gen-ac-index --include-builtin-index --index-location $(mktemp)

real    0m10.893s
user    0m10.412s
sys     0m0.351s

Compare this to the current implementation with a transaction per insert:

$ time ./scripts/gen-ac-index --include-builtin-index --index-location $(mktemp)

real    7m33.245s
user    0m27.051s
sys     0m12.241s

Which is a fairly significant (97%) reduction in time.

Use Case

Each time the aws-cli-v2 package is updated in Arch it results in a from-source build of the cli. This is one of the longest steps in upgrading the software on my machine - often longer than rebuilding kernel modules and the initrd for half a dozen kernels. Improving the build and install time for the aws-cli would make it less likely for me to skip out on upgrading the cli when an update becomes available.

Proposed Solution

This is not a proper solution given the hacky nature, but this patch off of 2.17.40 was enough to realize the 97% performance improvement:

$ git diff 2.17.40
diff --git a/awscli/autocomplete/local/indexer.py b/awscli/autocomplete/local/indexer.py
index 6691e7abd..e69739a8a 100644
--- a/awscli/autocomplete/local/indexer.py
+++ b/awscli/autocomplete/local/indexer.py
@@ -74,10 +74,12 @@ class ModelIndexer(object):
         )
         help_command_table = clidriver.create_help_command().command_table
         command_table = clidriver.subcommand_table
-        self._generate_arg_index(command=parent, parent='',
-                                 arg_table=clidriver.arg_table)
-        self._generate_command_index(command_table, parent=parent,
-                                     help_command_table=help_command_table)
+        with self._db_connection._connection as conn:
+            conn.execute("BEGIN")
+            self._generate_arg_index(command=parent, parent='',
+                                     arg_table=clidriver.arg_table)
+            self._generate_command_index(command_table, parent=parent,
+                                         help_command_table=help_command_table)

         self._generate_table_indexes()

diff --git a/awscli/autocomplete/serverside/indexer.py b/awscli/autocomplete/serverside/indexer.py
index 9a6e240b7..f3e152fb4 100644
--- a/awscli/autocomplete/serverside/indexer.py
+++ b/awscli/autocomplete/serverside/indexer.py
@@ -51,8 +51,10 @@ class APICallIndexer(object):
         self._create_tables()
         session = clidriver.session
         loader = session.get_component('data_loader')
-        for key, command in self._iter_all_commands(clidriver):
-            self._construct_completion_data(loader, command)
+        with self._db_connection._connection as conn:
+            conn.execute("BEGIN")
+            for key, command in self._iter_all_commands(clidriver):
+                self._construct_completion_data(loader, command)

     def _iter_all_commands(self, clidriver):
         stack = sorted(clidriver.subcommand_table.items())

Other Information

Thanks for considering this massive quality of live improvement.

Acknowledgements

CLI version used

2.17.40

Environment details (OS name and version, etc.)

Arch Linux

tim-finnigan commented 1 week ago

Thanks for the feature request, we can track this for more discussion and input.