Currently when generating the autocomplete index, the underlying code populates a sqlite3 database with the needed completion information. The underlying generation code doesn't perform explicit transactions, and as a result each DML statement issued is run through the full transaction cycle.
This can be vastly improved by increasing the size of the transactions used when populating the database. As a proof of concept, I added two explicit transaction blocks in awscli/autocomplete/local/indexer.py and awscli/autocomplete/serverside/indexer.py on top of 2.17.40. Running gen-ac-index I get the following:
$ time ./scripts/gen-ac-index --include-builtin-index --index-location $(mktemp)
real 0m10.893s
user 0m10.412s
sys 0m0.351s
Compare this to the current implementation with a transaction per insert:
$ time ./scripts/gen-ac-index --include-builtin-index --index-location $(mktemp)
real 7m33.245s
user 0m27.051s
sys 0m12.241s
Which is a fairly significant (97%) reduction in time.
Use Case
Each time the aws-cli-v2 package is updated in Arch it results in a from-source build of the cli. This is one of the longest steps in upgrading the software on my machine - often longer than rebuilding kernel modules and the initrd for half a dozen kernels. Improving the build and install time for the aws-cli would make it less likely for me to skip out on upgrading the cli when an update becomes available.
Proposed Solution
This is not a proper solution given the hacky nature, but this patch off of 2.17.40 was enough to realize the 97% performance improvement:
$ git diff 2.17.40
diff --git a/awscli/autocomplete/local/indexer.py b/awscli/autocomplete/local/indexer.py
index 6691e7abd..e69739a8a 100644
--- a/awscli/autocomplete/local/indexer.py
+++ b/awscli/autocomplete/local/indexer.py
@@ -74,10 +74,12 @@ class ModelIndexer(object):
)
help_command_table = clidriver.create_help_command().command_table
command_table = clidriver.subcommand_table
- self._generate_arg_index(command=parent, parent='',
- arg_table=clidriver.arg_table)
- self._generate_command_index(command_table, parent=parent,
- help_command_table=help_command_table)
+ with self._db_connection._connection as conn:
+ conn.execute("BEGIN")
+ self._generate_arg_index(command=parent, parent='',
+ arg_table=clidriver.arg_table)
+ self._generate_command_index(command_table, parent=parent,
+ help_command_table=help_command_table)
self._generate_table_indexes()
diff --git a/awscli/autocomplete/serverside/indexer.py b/awscli/autocomplete/serverside/indexer.py
index 9a6e240b7..f3e152fb4 100644
--- a/awscli/autocomplete/serverside/indexer.py
+++ b/awscli/autocomplete/serverside/indexer.py
@@ -51,8 +51,10 @@ class APICallIndexer(object):
self._create_tables()
session = clidriver.session
loader = session.get_component('data_loader')
- for key, command in self._iter_all_commands(clidriver):
- self._construct_completion_data(loader, command)
+ with self._db_connection._connection as conn:
+ conn.execute("BEGIN")
+ for key, command in self._iter_all_commands(clidriver):
+ self._construct_completion_data(loader, command)
def _iter_all_commands(self, clidriver):
stack = sorted(clidriver.subcommand_table.items())
Other Information
Thanks for considering this massive quality of live improvement.
Acknowledgements
[ ] I may be able to implement this feature request
Describe the feature
Currently when generating the autocomplete index, the underlying code populates a sqlite3 database with the needed completion information. The underlying generation code doesn't perform explicit transactions, and as a result each DML statement issued is run through the full transaction cycle.
This can be vastly improved by increasing the size of the transactions used when populating the database. As a proof of concept, I added two explicit transaction blocks in
awscli/autocomplete/local/indexer.py
andawscli/autocomplete/serverside/indexer.py
on top of 2.17.40. Running gen-ac-index I get the following:Compare this to the current implementation with a transaction per insert:
Which is a fairly significant (97%) reduction in time.
Use Case
Each time the aws-cli-v2 package is updated in Arch it results in a from-source build of the cli. This is one of the longest steps in upgrading the software on my machine - often longer than rebuilding kernel modules and the initrd for half a dozen kernels. Improving the build and install time for the aws-cli would make it less likely for me to skip out on upgrading the cli when an update becomes available.
Proposed Solution
This is not a proper solution given the hacky nature, but this patch off of 2.17.40 was enough to realize the 97% performance improvement:
Other Information
Thanks for considering this massive quality of live improvement.
Acknowledgements
CLI version used
2.17.40
Environment details (OS name and version, etc.)
Arch Linux