Bouni / kicad-jlcpcb-tools

Plugin to generate BOM + CPL files for JLCPCB, assigning LCSC part numbers directly from the plugin, query the JLCPCB parts database, lookup datasheets and much more.
MIT License
1.16k stars 105 forks source link

Create and use a separate table (parts_by_lcsc) to index FTS5 parts table by rowid to speed up. #444

Closed gyohng closed 1 month ago

gyohng commented 5 months ago

The purpose of this PR is to increase the speed of any operation that involves list rebuilding, which is basically any operation, such as part selection, sorting, hiding/showing BOM parts, or even invoking the plugin window.

The database increase due to this is +3.3%, which I think is a reasonable trade-off until a better way is found to integrate it with FTS5 directly while maintaining the speed.

The problem is illustrated in this YouTube video (the principal part of the commit is initially commented out and then uncommented to show the difference in performance impact).

Youtube video with demonstrating the problem

The commit includes the following changes:

  1. Creation of a New Table (parts_by_lcsc): A new table is created in the database to hold mappings between part identifiers (partsId) and the LCSC Part numbers. This table is designed to optimise lookup times by indexing parts based on their LCSC Part number rather than through direct queries on the parts table, which seems to be slow.

  2. Modification in get_part_details Function:

    • The method has been updated to first attempt fetching partsId values from the new parts_by_lcsc table using the provided LCSC Part numbers.
    • If the new method finds all required partsId, it then fetches the parts details directly using these IDs, which is much faster as it leverages the row IDs for direct access instead of scanning through text fields.
    • In case of any issues (e.g., the new table not having all the entries), it falls back to the original method of fetching data directly using the LCSC Part numbers from the parts table.
  3. Database Indexing in the download Method:

    • A significant process added in the download method is the population of the parts_by_lcsc table and the removal of an old index if it exists.
    • It iterates through every row in the parts table, inserts or updates the mapping in the parts_by_lcsc table, and updates a progress indicator, which communicates the operation's progress to the user interface.
    • After filling up the new table, it creates an index on the lcsc column of the parts_by_lcsc table to speed up the lookups.
    • This indexing is expected to significantly reduce the query time by utilising a quicker, more efficient index.
  4. User Interface Updates: The code updates a gauge control that reflects the progress of the database indexing operation. On a modern M1 MacBook Pro it finishes under 5 seconds and is significantly faster than the downloading.

chmorgan commented 5 months ago

Can we avoid rebuilding the list to reduce the impact of list rebuilding being expensive?

I get having a separate index table and considered such things. Imo it isn't worth the complexity if the cost of a few seconds is only when users alter the sort list of a component list with 100+ components. But again, I'm likely biased.

gyohng commented 5 months ago

Can we avoid rebuilding the list to reduce the impact of list rebuilding being expensive?

I don't think it's possible if we want to keep the component list synchronised to the database. Even at a cost of a major refactor, you'll still need to do it every time the plugin starts, which is quite annoyingly slow right now.

In the view of the FTS5 change, which expanded the database four-fold (and if you download the second time, watch for the multi-gigabyte bak file), I think a db size increase of 3.3% and an extra SELECT is not such a big thing.

whmountains commented 5 months ago

I second the observation that plugin startup has been very slow since the fts5 change. The plugin also freezes for about 20 seconds whenever I select a part which sort of defeats the purpose of fts5 in the first place. Search as you type is no fun if you then have to wait for the changes to commit.

I was just in the process of doing some profiling to troubleshoot the issue but it looks like @gyohng may have found it already.

(I'm using MacOS and KiCAD 8 BTW)

chmorgan commented 5 months ago

Agree any freezes should be eliminated or reduced.

For the freezing, what is the action that results in the freezing for you? I’m assuming it’s something other than changing the parts sort order.

On Wed, Apr 24, 2024 at 1:02 PM Caleb Whiting @.***> wrote:

I second the observation that plugin startup has been very slow since the fts5 change. The plugin also freezes for about 20 seconds whenever I select a part which sort of defeats the purpose of fts5 in the first place. (I'm using MacOS and KiCAD 8 BTW)

I was just in the process of doing some profiling to troubleshoot the issue but it looks like @gyohng https://github.com/gyohng may have found it already.

— Reply to this email directly, view it on GitHub https://github.com/Bouni/kicad-jlcpcb-tools/pull/444#issuecomment-2075428744, or unsubscribe https://github.com/notifications/unsubscribe-auth/AAJH4AFQEC3PGHRBNEN3JD3Y67QQ5AVCNFSM6AAAAABGRLPCMWVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDANZVGQZDQNZUGQ . You are receiving this because you commented.Message ID: @.***>

whmountains commented 5 months ago

The slowness seems to come from the populate_footprint_list function. I added some log statements to print the execution time whenever the function was called. Here are some measurements I got in standalone mode:

Time taken in the populate_footprint_list function:

chmorgan commented 5 months ago

That helps. I'll take a look to see what might be going on there. We know the fts5 stuff is a lot of overhead (in exchange for the free text searching), there may be some straightforward wins to cut that time down on those steps you mention taking a long time.

In terms of performance, I wrote a rust program that does profiling against the old and new databases, and various versions in between, relative to search queries, and it was helpful. I'll do the same here for these paths.

On Wed, Apr 24, 2024 at 2:21 PM Caleb Whiting @.***> wrote:

The slowness seems to come from the populate_footprint_list function. I added some log statements to print the execution time whenever the function was called. Here are some measurements I got in standalone mode:

Time taken in the populate_footprint_list function:

  • Startup - 15.94s
  • Click "Select part" from the part picker dialog - 13.74s
  • Sort by stock in the main window - 13.40s.

— Reply to this email directly, view it on GitHub https://github.com/Bouni/kicad-jlcpcb-tools/pull/444#issuecomment-2075562076, or unsubscribe https://github.com/notifications/unsubscribe-auth/AAJH4ACEQFFZLQ4EUYPEBITY67ZZ5AVCNFSM6AAAAABGRLPCMWVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDANZVGU3DEMBXGY . You are receiving this because you commented.Message ID: @.***>

whmountains commented 5 months ago

I think #360 is related. The issue was actually fixed in 5eb59e2, with the comment "massive startup time improvement" but it never got closed. We lost this index with the move to FTS5, which caused startup time to once again take a lot longer.

gyohng commented 5 months ago

I think #360 is related. The issue was actually fixed in 5eb59e2, with the comment "massive startup time improvement" but it never got closed. We lost this index with the move to FTS5, which caused startup time to once again take a lot longer.

Hi! :) I am actually a fan of your previously submitted PR with LIKE optimisation, but now FTS5 took over...

This PR is essentially a replica of 5eb59e2. The main problem with 5eb59e2 (indexing by LCSC Part) now is that there's no way to reintroduce it with FTS5 - the FTS5's own index isn't quite as efficient, and sqlite won't accept index creation for an FTS table.

I went even as far as creating an index on the internal parts_content table for the FTS structure (it didn't help). If I had to choose, I'd probably keep the FTS5 part separately just for the part search and otherwise resort to the old table with index approach for everything else, even despite the data growth.

whmountains commented 5 months ago

Hi @gyohng! I thought I already replied but I can't find the comment so I'll reply again.

I think this comes down to workflow. You and I make heavy use of parametric search, so #439 was really all we needed and FTS5 has a high bar to meet. @chmorgan and probably a lot of other people prefer a single search-box interface hence the project going in that direction.

But I think we all agree that the UI freezing for 15 seconds or more every time a part is selected is not acceptable to anyone's workflow. You may be interested in the discussion under #451, where we are trying to figure out where the slowdown is coming from. And #450 which is a totally diffferent idea for a parts database.

Bouni commented 3 months ago

I just tried this PR but it always stops at "DEBUG - download - Indexing parts table..." The progress bar is not filling up so I guess something is not working as expected. @gyohng Any ideas why?

gyohng commented 3 months ago

I just tried this PR but it always stops at "DEBUG - download - Indexing parts table..." The progress bar is not filling up so I guess something is not working as expected. @gyohng Any ideas why?

It used to work fine, I will check with the latest pull. By the way, I noted that your logging is buffered and not flushed, so you normally won't see things until several lines (sometimes minutes) afterwards.

gyohng commented 3 months ago

Please check again. 'IF EXISTS' clause was missing from DROP TABLE. I could never run in to it at my end, because this table was already indexed with a previous version. Now it should work fine.

gyohng commented 3 months ago

I added another commit to make linter happy with this PR.

Oscilllator commented 2 months ago

I didn't see this pull request prior to taking a look at the performance myself, but I was able to make get_part_details 100x faster without too much trouble in https://github.com/Bouni/kicad-jlcpcb-tools/pull/503. From talking with my good friend ChatGPT it seems that many of the SQL queries seem to not take into account the FTS5 nature of the table.

chmorgan commented 2 months ago

@Oscilllator speeding the query without the additional table is a simpler approach.

gyohng commented 2 months ago

@chmorgan if the other one works the same fast, let's close then?

chmorgan commented 2 months ago

@gyohng it might not be as fast but it does appear like its faster (note that I'm back using the plugin to release some boards but I haven't tested it). Your call on closing this one if you think its been solved, or keeping it open if you think this is a better approach.

Bouni commented 1 month ago

I'll close this as its no longer an issue