Open guilhemchalancon opened 10 years ago
Hi Guillaume
Thanks a lot for your emails and bug reports,
I'll forward the information to Giorgio Russolillo, and hopefully we'll be adding the necessary modifications to the plspm package soon
All the best,
Gaston
On Mon, Mar 17, 2014 at 11:25 AM, guilhemchalancon <notifications@github.com
wrote:
Dear Gaston,
I run into a bug when using NAs for non reflective blocks (in 30aaec0https://github.com/gastonstat/plspm/commit/30aaec0a44b5c8be7bc24ef48dcf8283ce6b80f2as well as in the 0.4.1 stable version).
For instance, introducing NAs in data(russa) as you shown here https://github.com/gastonstat/plspm, plspm() works fine with A or NewA modes, but fails with B, PLSCORE or PLSCOW.
To verify this, I loaded the toy dataset with 3 NAs being introduced:
data(russa) russNA = russa russNA[1,1] = NA russNA[4,4] = NA russNA[6,6] = NA
rus_path = rbind(c(0, 0, 0), c(0, 0, 0), c(1, 1, 0)) rownames(rus_path) = c("AGRI", "IND", "POLINS") colnames(rus_path) = c("AGRI", "IND", "POLINS") rus_blocks = list(1:3, 4:5, 6:9) rus_scaling = list(c("NUM", "NUM", "NUM"), c("NUM", "NUM"), c("NUM", "NUM", "NUM", "NUM"))
Then running plspm() with non reflective modes:
plspm(russNA, rus_path, rus_blocks, scaling = rus_scaling, modes = rep("PLSCOW",3), scheme = "centroid", plscomp = c(1,1,1), tol = 0.0000001)
OR
plspm(russNA, rus_path, rus_blocks, scaling = rus_scaling, modes = rep("PLSCORE",3), scheme = "centroid", plscomp = c(1,1,1), tol = 0.0000001)
OR
plspm(russNA, rus_path, rus_blocks, scaling = rus_scaling, modes = rep("B",3), scheme = "centroid", plscomp = c(1,1,1), tol = 0.0000001)
...which in all cases leads to an error due to _non-conformable elements_in get_weights_nonmetric.
After digging out what the issue was, I found out that the reason is in the format of get_PLSR_NA() outputs.
Specifically, if we look at the what happens for non-reflective modes when NAs are detected, we can see that w[[q]] is obtained by get_PLSR_NA:
if (specs$modes[q] == "PLSCORE") { if (missing_data[q]) { w[[q]] = get_PLSR_NA(Y = Z[,q], X = QQ[[q]], ncomp = PLScomp[q])$B )
compute Y[i,q] as the regr. coeff. of QQ[[q]][i,] on w[[q]]
# considering only the columns where QQ[[q]][i,l] exist Y[,q] = colSums(t(QQ[[q]])_w[[q]], na.rm=TRUE) Y[,q] = Y[,q]/colSums((t(X_avail[[q]])_w[[q]])^2) # normalize Y[,q] to unitary variance Y[,q] = scale(Y[,q]) \* correction } else {# complete data in block q w[[q]] = get_PLSR(Y = Z[,q], X = QQ[[q]], ncomp = PLScomp[q])$B Y[,q] = QQ[[q]] %*% w[[q]] Y[,q] = scale(Y[,q]) \* correction } }
It turns out that get_PLSR_NA() renders a 1-column matrix, although a numeric vector is needed for the product t(QQ[[q]])*w[[q]] to work. As a result the function fails to assign values to Y[,q].
Note that the only cases in get_weights_nonmetric() where w[[q]] is _not_a numeric vector is precisely when missing_data(q) == T and specs$modes[q] is either PLSCORE, PLSCOW or B. In other words, whenever get_PLSR_NA() is called.
I found that converting w[[q]] to the right format (i.e. w[[q]] = t( get_PLSR_NA(Y=...) )[1,] was sufficient in my context (because I didn't check whether get_PLSR_NA was used in other contexts where a 1-column matrix might be expected).
However, it might be better to change the output format of get_PLSR_NA()instead.
Best regards, G
Reply to this email directly or view it on GitHubhttps://github.com/gastonstat/plspm/issues/2 .
_G_aston _S_anchez, PhD gastonsanchez.com http://www.gastonsanchez.com
Dear Gaston,
I run into a bug when using NAs for non reflective blocks (in 30aaec0a44b5c8be7bc24ef48dcf8283ce6b80f2 as well as in the 0.4.1 stable version).
For instance, introducing NAs in
data(russa)
as you shown here https://github.com/gastonstat/plspm,plspm()
works fine with A or NewA modes, but fails with B, PLSCORE or PLSCOW.To verify this, I loaded the toy dataset with 3 NAs being introduced:
Then running
plspm()
with non reflective modes:...which in all cases leads to an error due to non-conformable elements in
get_weights_nonmetric
.After digging out what the issue was, I found out that the reason is in the format of
get_PLSR_NA()
outputs.Specifically, if we look at the what happens for non-reflective modes when NAs are detected, we can see that
w[[q]]
is obtained byget_PLSR_NA
:It turns out that
get_PLSR_NA()
renders a 1-column matrix, although a numeric vector is needed for the productQQ[[q]])*w[[q]]
to work. As a result the function fails to assign values toY[,q]
.Note that the only cases in
get_weights_nonmetric()
wherew[[q]]
is not a numeric vector is precisely whenmissing_data(q) == T
andspecs$modes[q]
is either PLSCORE, PLSCOW or B. In other words, wheneverget_PLSR_NA()
is called.I found that converting
w[[q]]
to the right format (i.e.w[[q]] = t( get_PLSR_NA(Y=...) )[1,]
was sufficient in my context (because I didn't check whetherget_PLSR_NA
was used in other contexts where a 1-column matrix might be expected).However, it might be better to change the output format of
get_PLSR_NA()
instead.Best regards, G