Closed biona001 closed 8 months ago
After a bit more digging, I am more inclined to think there are no performance regression, because the original problem likely ran on a subset of the variables.
p = 1241 # this includes group 263, which is the largest group with 192 members
groups = groups[1:p]
Σ = Σ[1:p, 1:p]
# Knockoffs.jl version 1.1.7
@time solve_s_group(
Symmetric(Σ), groups, :maxent,
m = 1, # number of knockoffs per variable to generate
tol = 0.00001, # convergence tolerance
outer_iter = 10, # max number of coordinate descent iterations
inner_ccd_iter = 1,
inner_pca_iter = 0,
robust = false, # whether to use robust cholesky updates
verbose = true # whether to print informative intermediate results
);
┌ Warning: Maximum group size is 192, optimization may be slow. Consider running `modelX_gaussian_rep_group_knockoffs` to speed up convergence.
└ @ Knockoffs /home/groups/sabatti/.julia/packages/Knockoffs/Pvo6U/src/group.jl:360
Maxent initial obj = -13827.762715952922
Iter 1 (CCD): obj = -8578.305552411875, δ = 0.9025475541472799, t1 = 11.23, t2 = 23.98, t3 = 0.06
Iter 2 (CCD): obj = -7555.017497751844, δ = 0.7478481743005219, t1 = 23.57, t2 = 47.4, t3 = 0.13
Iter 3 (CCD): obj = -7550.483748831955, δ = 0.018305715936092613, t1 = 35.33, t2 = 70.84, t3 = 0.21
Iter 4 (CCD): obj = -7548.862400282033, δ = 0.016699603678850573, t1 = 46.44, t2 = 94.33, t3 = 0.28
Iter 5 (CCD): obj = -7548.3273548868, δ = 0.008661606230114097, t1 = 56.82, t2 = 117.76, t3 = 0.35
Iter 6 (CCD): obj = -7548.015458746686, δ = 0.00498206737500809, t1 = 66.43, t2 = 141.06, t3 = 0.42
Iter 7 (CCD): obj = -7547.793581622985, δ = 0.0026061168087445695, t1 = 75.48, t2 = 164.59, t3 = 0.49
Iter 8 (CCD): obj = -7547.619776039311, δ = 0.0014028998110954276, t1 = 83.86, t2 = 188.1, t3 = 0.56
Iter 9 (CCD): obj = -7547.4770852558795, δ = 0.0007153736353831524, t1 = 91.84, t2 = 211.48, t3 = 0.64
Iter 10 (CCD): obj = -7547.358121710283, δ = 0.0002760958077924706, t1 = 99.6, t2 = 234.89, t3 = 0.71
336.402501 seconds (595.27 k allocations: 357.114 MiB, 0.01% gc time)
With p=1241
, the overall time and objective value is much closer to the value attained 2 years ago.
A few more hours later, I'm starting to think PCA updates are slower than they should be. Here is a test on 9000 by 9000 input matrix
file = "/oak/stanford/groups/zihuai/HighD_Sigma/UCB_cor.csv"
Σ = readdlm(file, ',', skipstart=1)
groups = hc_partition_groups(Symmetric(Σ), cutoff=0.5, linkage=:average)
group_reps = choose_group_reps(Symmetric(Σ), groups, threshold=0.5);
On Julia v1.10 & Knockoffs.jl version 1.1.7:
solve_s_graphical_group(
Symmetric(Σ), groups, group_reps, :maxent;
m = 5, verbose = true, inner_pca_iter=1,
inner_ccd_iter=1, outer_iter = 2
)
8999 representatives for 9132 variables, 9033 optimization variables
PCA optimization: 8999 variables
CCD optimization: 8999 diagonal variables
CCD optimization: 17 off-diagonal variables
Maxent initial obj = -130896.16543115964
Iter 1 (PCA): obj = -124299.86980415294, δ = 0.5286405884857434, t1 = 1279.42, t2 = 278.81
Iter 2 (CCD): obj = -123656.14555594444, δ = 0.3698519162684533, t1 = 43.83, t2 = 274.62, t3 = 0.0
Iter 3 (PCA): obj = -118778.12068774736, δ = 0.42098723933166904, t1 = 2481.3, t2 = 524.16
Iter 4 (CCD): obj = -118445.11098235719, δ = 0.2330629376150859, t1 = 86.36, t2 = 513.34, t3 = 0.0
PCA timings: t1_pca = 2481.29556, t2_pca = 524.15894, t3_pca = 0.0
CCD timings: t1_ccd = 86.36043, t2_ccd = 513.33717, t3_ccd = 0.00045
It's clear that t1
(time to update Cholesky factors) is a lot slower for PCA updates than CCD updates, which doesn't make much sense to me since CCD requires more, not less, Cholesky updates. Thus I added some temporary code to check how many "early Cholesky terminations" happen for PCA vs CCD:
solve_s_graphical_group(
Symmetric(Σ), groups, group_reps, :maxent;
m = 5, verbose = true, inner_pca_iter=1,
inner_ccd_iter=1, outer_iter = 10
)
8999 representatives for 9132 variables, 9033 optimization variables
PCA optimization: 8999 variables
CCD optimization: 8999 diagonal variables
CCD optimization: 17 off-diagonal variables
Maxent initial obj = -130896.16543115964
Iter 1 (PCA): obj = -124299.86980415294, δ = 0.5286405884857434, t1 = 1279.98, t2 = 287.63
pca_early_terminated = 9019
Iter 2 (CCD): obj = -123656.14555594444, δ = 0.3698519162684533, t1 = 42.88, t2 = 273.52, t3 = 0.0
ccd_early_terminated = 207
Iter 3 (PCA): obj = -118778.12068774736, δ = 0.42098723933166904, t1 = 2526.84, t2 = 557.25
pca_early_terminated = 9019
Iter 4 (CCD): obj = -118445.11098235719, δ = 0.2330629376150859, t1 = 85.95, t2 = 531.38, t3 = 0.0
ccd_early_terminated = 231
Iter 5 (PCA): obj = -114810.46063950789, δ = 0.3855781593833003, t1 = 3816.9, t2 = 850.46
pca_early_terminated = 9019
Iter 6 (CCD): obj = -114638.32618042963, δ = 0.09078804983710237, t1 = 118.88, t2 = 821.56, t3 = 0.0
ccd_early_terminated = 192
Iter 7 (PCA): obj = -112127.17150059695, δ = 0.3573975408848783, t1 = 5112.73, t2 = 1145.55
pca_early_terminated = 9019
Iter 8 (CCD): obj = -112033.49440467537, δ = 0.06394698256688378, t1 = 141.36, t2 = 1113.82, t3 = 0.0
ccd_early_terminated = 155
Iter 9 (PCA): obj = -110229.42080921704, δ = 0.2913788676608088, t1 = 6407.41, t2 = 1440.61
pca_early_terminated = 9019
Iter 10 (CCD): obj = -110175.8992055564, δ = 0.04113289361230646, t1 = 155.91, t2 = 1404.21, t3 = 0.0
ccd_early_terminated = 120
Iter 11 (PCA): obj = -108676.19894636273, δ = 0.19573378930103782, t1 = 7700.19, t2 = 1734.3
pca_early_terminated = 9019
Iter 12 (CCD): obj = -108657.18365041119, δ = 0.01959907766848471, t1 = 165.56, t2 = 1693.48, t3 = 0.0
ccd_early_terminated = 80
Iter 13 (PCA): obj = -107312.02842853116, δ = 0.18208840012112923, t1 = 8989.03, t2 = 2027.97
pca_early_terminated = 9019
Iter 14 (CCD): obj = -107302.33547962835, δ = 0.025094298874137874, t1 = 173.6, t2 = 1982.7, t3 = 0.0
ccd_early_terminated = 54
Iter 15 (PCA): obj = -106186.3408574712, δ = 0.22998709100773107, t1 = 10282.46, t2 = 2322.45
pca_early_terminated = 9019
Iter 16 (CCD): obj = -106182.00466931309, δ = 0.013124608119177414, t1 = 181.32, t2 = 2272.57, t3 = 0.0
ccd_early_terminated = 44
Iter 17 (PCA): obj = -105226.74312417053, δ = 0.18798423882462398, t1 = 11585.63, t2 = 2618.13
pca_early_terminated = 9019
Iter 18 (CCD): obj = -105225.00889733771, δ = 0.016869569663403902, t1 = 188.68, t2 = 2565.28, t3 = 0.0
ccd_early_terminated = 36
Iter 19 (PCA): obj = -104351.73259806576, δ = 0.19649363409471488, t1 = 12893.27, t2 = 2915.27
pca_early_terminated = 9019
Iter 20 (CCD): obj = -104351.0770831528, δ = 0.005042151160751619, t1 = 195.93, t2 = 2855.19, t3 = 0.0
ccd_early_terminated = 34
PCA timings: t1_pca = 12893.26639, t2_pca = 2915.26801, t3_pca = 0.0
CCD timings: t1_ccd = 195.93278, t2_ccd = 2855.18508, t3_ccd = 0.0021
This is weird: t1_pca
for PCA takes >100x longer than t1_ccd
even though it early terminates much more often.
It turns out the reason that t1_ccd
is much faster than t1_pca
is because CCD updates were being skipped over very often (doesn't improve the objective). The reason for this is because of a small bug in the update to the diagonal entries of CCD updates. After fixing this, the CCD updates take equally long as PCA updates, but the objective improves much faster.
Julia v1.10 and soon-to-be Knockoffs.jl v1.1.8:
solve_s_graphical_group(
Symmetric(Σ), groups, group_reps, :maxent;
m = 5, verbose = true, inner_pca_iter=1,
inner_ccd_iter=1, outer_iter = 10
)
8999 representatives for 9132 variables, 9033 optimization variables
PCA optimization: 8999 variables
CCD optimization: 8999 diagonal variables
CCD optimization: 17 off-diagonal variables
Maxent initial obj = -130896.16543115964
Iter 1 (PCA): obj = -124299.86980415294, δ = 0.5286405884857434, t1 = 909.06, t2 = 264.1
pca_early_terminated = 9019
Iter 2 (CCD): obj = -118639.79956687437, δ = 0.495211713131239, t1 = 891.52, t2 = 260.2, t3 = 0.0
ccd_early_terminated = 9022
Iter 3 (PCA): obj = -114686.69186315064, δ = 0.45275292664408795, t1 = 1833.05, t2 = 546.13
pca_early_terminated = 9019
Iter 4 (CCD): obj = -112096.68739662363, δ = 0.32684850310884456, t1 = 1840.3, t2 = 546.12, t3 = 0.0
As reported by Jiaqi, there seems to be some significant performance regression for group knockoff solver.
MWE: (Knockoffs.jl v1.1.7 and on Julia 1.10)
On Knockoffs.jl v0.3.0 (Julia 1.6.7), see Fast Cholesky update notebook
Initial objective value being different is due to different initialization of S. However the timing seems much slower, even if ran on different computers.