Closed Mamba413 closed 2 years ago
In most of case, the result is correct. For example, when p is changed to 20, 30, 40. The result are match to the expectation. I also print the result in Cpp and R. I seems that the difference is very subtle. The results printed in Cpp:
15520.7 3857.04 1832.87 1653.66 1220 1017.25 983.913 1148.9 1164.97 1739.28 1833.77
17938.3 5877.06 2929.43 2560.34 2416.98 2153.38 2565.76 2296.01 1999.35 2517.26 2355.97
13312.6 4048.81 2121.22 1885.49 1777.77 2327.84 2094.01 2507.27 2191.26 2632.55 2946.96
18170.5 4432.05 1575.29 1138.43 1291.11 1334.51 1109.72 1522.59 1977.38 1982.05 2043.54
13684.7 2128.02 1450.65 2080.38 2996.25 2554.03 3316.02 3065.15 2749.93 2768.66 2832.1
The results printed in R:
15520.68 3857.04 1832.87 1653.66 1220.00 1017.25 983.91 1148.90 1164.97 1739.28 1833.77
17938.29 5877.06 2929.43 2560.34 2416.98 1991.74 2565.76 2296.01 1999.35 2517.26 2355.97
13312.65 4048.81 2121.22 1885.49 1777.77 2327.84 2094.01 2507.27 2191.26 2632.55 2946.96
18170.55 4432.05 1575.29 1138.43 1291.11 1334.51 1109.72 1522.59 1977.38 1982.05 2043.54
13684.72 2128.02 1450.65 2080.38 2996.25 2554.03 3316.02 3065.15 2749.93 2768.66 2832.10
At present, I have no idea about why this happens. @Jiang-Kangkang @oooo26 @bbayukari
I found it caused by the rounding error in cout
.
Actually, I got the same Cpp results by simply std::cout << {TEST LOSS};
, but with std::cout<<std::fixed<<std::setprecision(2)<< {TEST LOSS};
(it supported by #include<iomanip>
), I got:
15520.68 3857.04 1832.87 1653.66 1220.00 1017.25 983.91 1148.90 1164.97 1739.28 1833.77
17938.29 5877.06 2929.43 2560.34 2416.98 2153.38 2565.76 2296.01 1999.35 2517.26 2355.97
13312.65 4048.81 2121.22 1885.49 1777.77 2327.84 2094.01 2507.27 2191.26 2632.55 2946.96
18170.55 4432.05 1575.29 1138.43 1291.11 1334.51 1109.72 1522.59 1977.38 1982.05 2043.54
13684.72 2128.02 1450.65 2080.38 2996.25 2554.03 3316.02 3065.15 2749.93 2768.66 2832.10
That is the same as R output.
I find a very subtle error. In the second row and the sixth column, the output of cout
is 2153.38
while R
's output is 1991.74
. The remaining values seem right.
How does it happens? The only possible reason (I believe) is that the coefficients used for computing test error are not the same as the beta passed to R
.
I find out that the two (abess and manual) workflows are actually unequal because of normalization, which is enabled by default:
Since "full data" and "data in fold k" have different means and variances, the normalized data will also be different. And then it is no surprise that their betas are inequality, which causes inequality in test loss.
In most instances, it would not lead to different active sets selected by abess, so the betas (and test losses) are close. But unfortunately, it does in sparsity 5 with data in fold 2 above. That is why the "error" happens.
Btw, in other situations, the test losses are actually unequal. You would find the very small difference in more (e.g. 10) decimal places.
With this code, the output would be the same as Cpp's:
library(abess)
n <- 100
p <- 200
support.size <- 3
dataset <- generate.data(n, p, support.size, seed = 1)
ss <- 0:5
nfolds <- 5
foldid <- rep(1:nfolds, ceiling(n / nfolds))[1:n]
abess_fit <- abess(dataset[["x"]], dataset[["y"]],
tune.type = "cv", nfolds = nfolds,
foldid = foldid, support.size = ss, num.threads = 1
)
x <- dataset[["x"]]
x <- apply(x, 2, function(c){c1 <- c - mean(c); c1 / norm(c1, type="2") * sqrt(n)})
y <- dataset[["y"]]
y <- y - mean(y)
cv <- rep(0, length(ss))
for (k in 1:nfolds) {
abess_fit_k <- abess(x[foldid != k, ],
y[foldid != k],
support.size = ss,
normalize=0
)
y_hat_k <- predict(abess_fit_k, x[foldid == k, ],
support.size = ss
)
fold_cv <- apply(y_hat_k, 2, function(yh) {
mean((y[foldid == k] - yh)^2)
})
fold_cv <- round(fold_cv, digits = 2)
print(fold_cv)
cv <- cv + fold_cv
}
cv <- cv / nfolds
names(cv) <- NULL
all.equal(cv, abess_fit$tune.value, digits = 2)
It simulates what the normalization does in abess.
Describe the bug The cross validation result is not the same as the result written in R.
The code to reproduce
Expected behavior The output of
all.equal(cv, abess_fit$tune.value, digits = 2)
is TRUE. However, the output is"Mean relative difference: 0.0008444762"
.System info