bsiegert / BulkTracker

Track bulk build status in pkgsrc
Other
8 stars 1 forks source link

Index results by build id and pkg id #55

Open riastradh opened 2 months ago

riastradh commented 2 months ago

The following index dramatically speeds up common queries:

CREATE INDEX results_i_build_pkg ON results (build_id, pkg_id);

It increases the size of the database by about 1/3, but compare, e.g., the GetResultsInCategory query (bottleneck of https://releng.netbsd.org/bulktracker/build/645/meta-pkgs):

GetSingleResultByPkgName (bottleneck of https://releng.netbsd.org/bulktracker/pkg/17227701):

Given the amount of CPU time mollari is spending in bulktracker, I think this couple hundred megabytes of space is worth it.

riastradh commented 2 months ago

Another index that may be worthwhile to speed up GetSingleResultByPkgName, to look up failed dependencies on a package results detail page of an indirect-failed package like https://releng.netbsd.org/bulktracker/pkg/17645166 (especially once https://github.com/bsiegert/BulkTracker/issues/56 is fixed so it can display more than one failed dependency at a time):

CREATE INDEX results_i_build_pkgname ON results (build_id, pkg_name);

I haven't measured how much space it takes or how much it speeds up queries, though -- just guessing by code inspection. (Should measure these before implementing it.)

riastradh commented 2 months ago

Another index that may be worthwhile to speed up GetPkgsBreakingMostOthers, to show on a bulk build details page like https://releng.netbsd.org/bulktracker/build/658 which packages break most others, by narrowing the search down in advance to which builds are broken:

CREATE INDEX results_i_build_pkg_broken ON results (build_id, pkg_id)
    WHERE build_status > 0;
bsiegert commented 2 months ago

The following index dramatically speeds up common queries:

CREATE INDEX results_i_build_pkg ON results (build_id, pkg_id);

It increases the size of the database by about 1/3, but compare, e.g., the GetResultsInCategory query (bottleneck of https://releng.netbsd.org/bulktracker/build/645/meta-pkgs):

I added this index just now. Will look into the others. Thank you!

riastradh commented 2 months ago

The list of results for a particular package like https://releng.netbsd.org/bulktracker/lang/rust is bottlenecked on GetAllPkgResults, and the results_i_build_pkg index doesn't help because it wants to look up the pkg id first, not the build id first. Could add an opposite index:

CREATE INDEX results_i_pkg_build ON results (pkg_id, build_id);

But maybe it would be better to just use three separate indices on each of the two or three relevant columns -- a cursory glance suggests that will work just as well for all the queries I checked, and cost less space than both results_i_build_pkg and results_i_pkg_build combined (about 1.1 GB vs 1.2 GB for the whole database):

CREATE INDEX results_i_build ON results (build_id);
CREATE INDEX results_i_pkg ON results (pkg_id);
-- plus maybe:
CREATE INDEX results_i_pkgname ON results (pkg_name);

Might be worthwhile to systematically examine all the queries to see which ones are improved by indices -- I have been spot-checking by mousing around the web site and noticing when things are slow, and I may have missed this slowness last time around because of the caching layer. All of the queries I spot-checked (GetResultsInCategory, GetAllPkgResults, GetSingleResultByPkgName, getPkgsBrokenBy) were quick with results_i_build + results_i_pkg, and some were slower with just results_i_build_pkg or with just results_i_pkg_build.