Database performance - Githubissues

serpent213 commented 1 year ago

The startup time can be greatly improved by creating an index on the LCSC part number:

CREATE UNIQUE INDEX parts_lcsc_part_index ON parts ('LCSC Part');

Before:

After:

(populate_footprint_list() from 2552ms to 137ms)

Another nice touch would be to open the database only once when the plugin starts up instead of whenever needed.

P.S.: I used VizTracer for the profiling:

--- a/plugin.py
+++ b/plugin.py
@@ -2,6 +2,7 @@

 import os

+from viztracer import VizTracer
 from pcbnew import ActionPlugin  # pylint: disable=import-error

 from .mainwindow import JLCPCBTools
@@ -25,6 +26,12 @@ class JLCPCBPlugin(ActionPlugin):

     def Run(self):
         """Overwrite Run"""
+        tracer = VizTracer()
+        tracer.start()
+
         dialog = JLCPCBTools(None)
         dialog.Center()
         dialog.Show()
+
+        tracer.stop()
+        tracer.save("/tmp/jlcpcb_profile.html")

Bouni commented 1 year ago

As always, PRs are highly appreciated 😁

I simply lack the knowledge to improve stuff like that 😅

serpent213 commented 1 year ago

P.P.S.: Easy to apply without touching the source:

% cd $PLUGIN_DIR/jlcpcb
% sqlite3 parts.db
SQLite version 3.39.5 2022-10-14 20:58:05
Enter ".help" for usage hints.
sqlite> CREATE UNIQUE INDEX parts_lcsc_part_index ON parts ('LCSC Part');
sqlite> .quit

As always, PRs are highly appreciated 😁

I was about to patch create_parts_table() but would need to dig deeper into your column naming and database migration mechanisms... Maybe one day. 😉

serpent213 commented 1 year ago

Some inspiration:

You do a lot of string concatenation for query building, like
```
query += f' ORDER BY "{self.order_by}" COLLATE naturalsort {self.order_dir}'
```
That might be bearable in this case (plugin for desktop use), but introduces SQL injection vulnerabilities.

For proper escaping always use the SQL library's parameter handling (cursor.execute(sql, [param_1, param_2])).
Is there a reason to not merge all plugin databases (mappings, parts, rotations) into a single one? “Parts” might make some sense to keep separate for easier backups, if that is a requirement.
I couldn't find any migration handling to update the database(s) for existing users. A common solution is to store the version number of the database structure inside of the database, typically in a table named “schema”. A detailed description of the concept can be found here. There are sophisticated libraries, but it can also be kept pretty simple.

Otherwise, how would you know when to call CREATE INDEX to upgrade existing databases? You could call CREATE INDEX IF NOT EXISTS on every startup, but that's cheating. 😉

HTH

serpent213 commented 1 year ago

As an example – and just for fun –, I gave ChatGPT a little spin:

Write a Python class to handle versioned migrations in a SQLite database. The class should be given a database connection and a list of versioned migrations (like [{'version': 1, 'sql': 'CREATE TABLE...'}, {'version': 2, 'sql': 'ALTER TABLE...'}]) on initialisation and have a method to do the actual migration. That migation method should accept a version identifier to migrate to as parameter. Also include a method to setup a database table for schema tracking when necessary, make it include timestamps. Rollback is not required, only forward migration.

Here's an updated version of the Python class that handles versioned migrations in a SQLite database, considering the updated requirements:

 import sqlite3                                                                                                                                                                   
 from datetime import datetime                                                                                                                                                    

 class DatabaseMigrator:                                                                                                                                                          
     def __init__(self, conn, migrations):                                                                                                                                        
         self.conn = conn                                                                                                                                                         
         self.migrations = migrations                                                                                                                                             

     def create_schema_table(self):                                                                                                                                               
         cursor = self.conn.cursor()                                                                                                                                              
         cursor.execute("CREATE TABLE IF NOT EXISTS schema_versions (version INTEGER PRIMARY KEY, timestamp TIMESTAMP)")                                                          
         self.conn.commit()                                                                                                                                                       

     def migrate(self, target_version):                                                                                                                                           
         cursor = self.conn.cursor()                                                                                                                                              

         current_version = self.get_current_version()                                                                                                                             
         if current_version >= target_version:                                                                                                                                    
             print(f"Database already at version {target_version}")                                                                                                               
             return                                                                                                                                                               

         for migration in self.migrations[current_version:target_version]:                                                                                                        
             try:                                                                                                                                                                 
                 cursor.execute(migration["sql"])                                                                                                                                 
                 current_version += 1                                                                                                                                             
             except Exception as e:                                                                                                                                               
                 print(f"Error migrating to version {current_version+1}: {str(e)}")                                                                                               
                 return                                                                                                                                                           

         timestamp = datetime.now().strftime("%Y-%m-%d %H:%M:%S")                                                                                                                 
         cursor.execute("INSERT OR REPLACE INTO schema_versions (version, timestamp) VALUES (?, ?)", (current_version, timestamp))                                                
         self.conn.commit()                                                                                                                                                       

         print(f"Database migrated to version {current_version}")                                                                                                                 

     def get_current_version(self):                                                                                                                                               
         cursor = self.conn.cursor()                                                                                                                                              
         cursor.execute("SELECT MAX(version) FROM schema_versions")                                                                                                               
         result = cursor.fetchone()                                                                                                                                               
         if result[0] is None:                                                                                                                                                    
             return 0                                                                                                                                                             
         return result[0]

Usage Example:

 # Assume the SQLite database file 'mydatabase.db' already exists.                                                                                                                
 conn = sqlite3.connect('mydatabase.db')                                                                                                                                          

 migrations = [                                                                                                                                                                   
     {'version': 1, 'sql': '''                                                                                                                                                    
         CREATE TABLE IF NOT EXISTS users (                                                                                                                                       
             id INTEGER PRIMARY KEY AUTOINCREMENT,                                                                                                                                
             name TEXT,                                                                                                                                                           
             email TEXT                                                                                                                                                           
         )                                                                                                                                                                        
     '''},                                                                                                                                                                        
     {'version': 2, 'sql': '''                                                                                                                                                    
         CREATE TABLE IF NOT EXISTS products (                                                                                                                                    
             id INTEGER PRIMARY KEY AUTOINCREMENT,                                                                                                                                
             name TEXT,                                                                                                                                                           
             price REAL                                                                                                                                                           
         )                                                                                                                                                                        
     '''}                                                                                                                                                                         
 ]                                                                                                                                                                                

 migrator = DatabaseMigrator(conn, migrations)                                                                                                                                    

 # Create the schema_versions table if it doesn't exist                                                                                                                           
 migrator.create_schema_table()                                                                                                                                                   

 # Migrate to the latest version                                                                                                                                                  
 target_version = len(migrations)                                                                                                                                                 
 migrator.migrate(target_version)

Note: In this updated implementation, each migration is now represented as a dictionary with keys "version" and "sql". The "version" key represents the version identifier, and "sql" key holds the SQL statement for that migration. The migrate() method has been updated to use these keys to execute the SQL statements in the migrations. Additionally, a timestamp is recorded in the schema_versions table for each migration for tracking purposes.

Good, but use the "version" id stored in the list of migrations to find the migrations to apply.

Apologies for the oversight. Here's the updated implementation of the migrate() method that uses the "version" id stored in the list of migrations:

 def migrate(self, target_version):                                                                                                                                               
     cursor = self.conn.cursor()                                                                                                                                                  

     current_version = self.get_current_version()                                                                                                                                 
     if current_version >= target_version:                                                                                                                                        
         print(f"Database already at version {target_version}")                                                                                                                   
         return                                                                                                                                                                   

     for migration in self.migrations:                                                                                                                                            
         version = migration["version"]                                                                                                                                           
         if current_version < version <= target_version:                                                                                                                          
             try:                                                                                                                                                                 
                 cursor.execute(migration["sql"])                                                                                                                                 
                 current_version = version                                                                                                                                        
             except Exception as e:                                                                                                                                               
                 print(f"Error migrating to version {version}: {str(e)}")                                                                                                         
                 return                                                                                                                                                           

     timestamp = datetime.now().strftime("%Y-%m-%d %H:%M:%S")                                                                                                                     
     cursor.execute("INSERT OR REPLACE INTO schema_versions (version, timestamp) VALUES (?, ?)", (current_version, timestamp))                                                    
     self.conn.commit()                                                                                                                                                           

     print(f"Database migrated to version {current_version}")

Now, the migrate() method iterates over the list of migrations and checks if the version falls within the range of current version (exclusive) to target version (inclusive). If so, it executes the corresponding SQL statement and updates the current version accordingly.

The rest of the class and usage remain the same.

Please let me know if you have any further questions!

chmorgan commented 1 year ago

@serpent213 I've got a PR that uses fts5 to improve search performance across all fields. I was also thinking, as you mentioned, that it made sense to keep the db open once its opened but haven't looked at that yet.

Bouni commented 1 year ago

@serpent213

P.P.S.: Easy to apply without touching the source:

% cd $PLUGIN_DIR/jlcpcb
% sqlite3 parts.db
SQLite version 3.39.5 2022-10-14 20:58:05
Enter ".help" for usage hints.
sqlite> CREATE UNIQUE INDEX parts_lcsc_part_index ON parts ('LCSC Part');
sqlite> .quit

So this could be integrated into the DB creation in the GH action, right?

https://github.com/Bouni/kicad-jlcpcb-tools/blob/main/jlcparts_db_convert.py

serpent213 commented 1 year ago

@Bouni

So this could be integrated into the DB creation in the GH action, right?

Not sure what GH means. But yes, a first step would be to just execute CREATE INDEX once after CREATE TABLE. Or even call CREATE INDEX IF NOT EXISTS during startup to update existing DBs (will take a few seconds), being a no-op during subsequent runs.

I'm pretty busy right now, maybe I can integrate the migration mechanism one day. 🙂

Edit: Note that with UNIQUE the index will change the behaviour of the DB: it's no longer possible to have the same LCSC Part twice. Should be the right thing to do, I guess.

Bouni commented 1 year ago

@serpent213 I've just tested CREATE UNIQUE INDEX parts_lcsc_part_index ON parts ('LCSC Part'); and the plugin startup ist almost instantly compared to several seconds without.

The only downside is that @chmorgan proposal for virtual tables does not allow the creation of indexes. at least not like @serpent213 suggested.

I'm not experienced in DB tech at all, so I have to rely on you guys 😅

At the moment I cannot judge if its better to stay with a normal table and use indexes or use virtual tables to improve search speed.

Ideally we would benchmark this with VisTracer like @serpent213 showed in a previous post!

chmorgan commented 1 year ago

I did notice the startup can take a bit and correct that the index on the virtual table isn't the case.

Is that the only index helping startup time? I've seen that library.categories() can take several seconds.

We could make a table of categories and subcategories as a part of creating the database. That would avoid having to select them each time and resolve the issue of being unable to add indexes to the virtual table. It's maybe not as clean but imo once you switch to the fts5 changes it's hard to go back in terms of search capability and performance.

@serpent213 thoughts? And have you seen the fts5 PR?

serpent213 commented 1 year ago

@serpent213 thoughts? And have you seen the fts5 PR?

Read the PR, but didn't play with it, yet. Will come back to you, guys. 🙂

serpent213 commented 1 year ago

Had some deeper insight into the code and the GH action thingy. 😎

With #348 I had mixed feelings. When unpacking the download it crashed once, worked then on second try. Startup time is extremely high on my system, more than 34s in comparison to the viztrace numbers above. Search was very glitchy, regularly it would hang and show the beachball of death with no high CPU load or anything. 🤷🏼‍♂️

OTOH, we had our good moments. For some periods of time I was able feel the benefits of its real-time search abilities. 🙂

I was testing with KiCAD 7.0.6-0 including Python 3.9.13.

Also the fts5-DB is 5.8 GiB here, that might be difficult to handle for some users.

To make it work, @chmorgan, I can imagine the following:

Make it optional, a checkbox in the settings.
Create fts5-DB on demand from original parts-DB.
Maybe optimise fts5-DB schema for just full text search.
Profit!

In general, I think, the next steps should be to implement a migration mechanism like sketched out above and clean up the code. That would also bring some more little speed improvements. Is there a reason for the separate rotations and mappings DBs? Why does jlcparts_db_convert.py create another set of mapping and rotation tables? Maybe you explain a little about the history, @Bouni?

chmorgan commented 1 year ago

Hey @serpent213!

Have you had a chance to use this plugin for a large number of parts mappings? I put the fts5 search in place for a single board and it saved me a ton of time, of course its perfect and I can't believe you found any criticisms... ;-)

This is good feedback. Let me see if I can't address at least a few of these over the next few days, in particular the startup time. I'll also look at putting the search into a background thread, that would avoid the UI hanging up when the search is a complex one.

Would you know what you were searching for that was causing the UI to hang up? It was previously pretty glitchy but after the tokenizer switch to trigram and not searching on empty keywords I haven't seen many, or any cases of it hanging up here.

serpent213 commented 1 year ago

Well, it is really a showstopper on my 2017 MacBook 4 x 2,8 GHz – not the hottest gear, but should still have enough oomph to run a parts database like this. 😉

The main problem is, my guess, that the new fts table performs very badly for general purpose (non-fts) access schemes like on startup, therefore it should only be used to index the text that is to be searched and contain a reference back to the traditional parts table as I described in #348.

chmorgan commented 1 year ago

I’m guessing it can do better. I’ll be profiling this weekend and see how much I can squeeze out of it. On the size topic if there are columns we don’t ever need or use it would help to drop them from the import even if not in the fts5 case.

Bouni commented 1 year ago

@serpent213 jlcparts_db_convert.py is the script run by the Github Action which converts the data provided by jlcparts and converts that into this plugins format in a sqlite database. This database file is stored as chunks of < 100MB on Github (no larger files allowed). When a user starts the plugin for the first time or updates the database these chunks are downloaded and merged into a normal db file again.

rotations and mappings where back in the day part of the project db, which turned out to be a bad idea because for every project the user had to do the rotations an mappings again. But the could be absolutely two tables in one db file ...

This plugin has huge room for improvements as you see 😅

Bouni / kicad-jlcpcb-tools

Database performance #360