Handling of NAs and get_weights_nonmetric() for non-reflective blocks

Dear Gaston,

I run into a bug when using NAs for non reflective blocks (in 30aaec0a44b5c8be7bc24ef48dcf8283ce6b80f2 as well as in the 0.4.1 stable version).

For instance, introducing NAs in data(russa) as you shown here https://github.com/gastonstat/plspm, plspm() works fine with A or NewA modes, but fails with B, PLSCORE or PLSCOW.

To verify this, I loaded the toy dataset with 3 NAs being introduced:

data(russa)
russNA = russa
russNA[1,1] = NA
russNA[4,4] = NA
russNA[6,6] = NA

rus_path = rbind(c(0, 0, 0), c(0, 0, 0), c(1, 1, 0))
rownames(rus_path) = c("AGRI", "IND", "POLINS")
colnames(rus_path) = c("AGRI", "IND", "POLINS")
rus_blocks = list(1:3, 4:5, 6:9)
rus_scaling = list(c("NUM", "NUM", "NUM"),
                   c("NUM", "NUM"),
                   c("NUM", "NUM", "NUM", "NUM"))

Then running plspm() with non reflective modes:

plspm(russNA, rus_path, rus_blocks, scaling = rus_scaling, modes = rep("PLSCOW",3), scheme = "centroid", plscomp = c(1,1,1), tol = 0.0000001)
# OR
plspm(russNA, rus_path, rus_blocks, scaling = rus_scaling, modes = rep("PLSCORE",3), scheme = "centroid", plscomp = c(1,1,1), tol = 0.0000001)
# OR
plspm(russNA, rus_path, rus_blocks, scaling = rus_scaling, modes = rep("B",3), scheme = "centroid", plscomp = c(1,1,1), tol = 0.0000001)

...which in all cases leads to an error due to non-conformable elements in get_weights_nonmetric.

After digging out what the issue was, I found out that the reason is in the format of get_PLSR_NA() outputs.

Specifically, if we look at the what happens for non-reflective modes when NAs are detected, we can see that w[[q]] is obtained by get_PLSR_NA:

  if (specs$modes[q] == "PLSCORE") {
        if (missing_data[q]) {
          w[[q]] = get_PLSR_NA(Y = Z[,q], X = QQ[[q]], ncomp = PLScomp[q])$B )
          # compute Y[i,q] as the regr. coeff. of QQ[[q]][i,] on w[[q]] 
          # considering only the columns where QQ[[q]][i,l] exist
          Y[,q] = colSums(t(QQ[[q]])*w[[q]], na.rm=TRUE)
          Y[,q] = Y[,q]/colSums((t(X_avail[[q]])*w[[q]])^2)
          # normalize Y[,q] to unitary variance
          Y[,q] = scale(Y[,q]) * correction     
        }
        else {# complete data in block q
          w[[q]] = get_PLSR(Y = Z[,q], X = QQ[[q]], ncomp = PLScomp[q])$B
          Y[,q] = QQ[[q]] %*% w[[q]]
          Y[,q] = scale(Y[,q]) * correction
        }   
      }

It turns out that get_PLSR_NA() renders a 1-column matrix, although a numeric vector is needed for the product QQ[[q]])*w[[q]] to work. As a result the function fails to assign values to Y[,q].

Note that the only cases in get_weights_nonmetric() where w[[q]] is not a numeric vector is precisely when missing_data(q) == T and specs$modes[q] is either PLSCORE, PLSCOW or B. In other words, whenever get_PLSR_NA() is called.

I found that converting w[[q]] to the right format (i.e. w[[q]] = t( get_PLSR_NA(Y=...) )[1,] was sufficient in my context (because I didn't check whether get_PLSR_NA was used in other contexts where a 1-column matrix might be expected).

However, it might be better to change the output format of get_PLSR_NA() instead.

Best regards, G

plspm(russNA, rus_path, rus_blocks, scaling = rus_scaling, modes = rep("B",3), scheme = "centroid", plscomp = c(1,1,1), tol = 0.0000001)

...which in all cases leads to an error due to _non-conformable elements_in get_weights_nonmetric.

After digging out what the issue was, I found out that the reason is in the format of get_PLSR_NA() outputs.

Specifically, if we look at the what happens for non-reflective modes when NAs are detected, we can see that w[[q]] is obtained by get_PLSR_NA:

if (specs$modes[q] == "PLSCORE") { if (missing_data[q]) { w[[q]] = get_PLSR_NA(Y = Z[,q], X = QQ[[q]], ncomp = PLScomp[q])$B )

compute Y[i,q] as the regr. coeff. of QQ[[q]][i,] on w[[q]]

# considering only the columns where QQ[[q]][i,l] exist Y[,q] = colSums(t(QQ[[q]])_w[[q]], na.rm=TRUE) Y[,q] = Y[,q]/colSums((t(X_avail[[q]])_w[[q]])^2) # normalize Y[,q] to unitary variance Y[,q] = scale(Y[,q]) \* correction } else {# complete data in block q w[[q]] = get_PLSR(Y = Z[,q], X = QQ[[q]], ncomp = PLScomp[q])$B Y[,q] = QQ[[q]] %*% w[[q]] Y[,q] = scale(Y[,q]) \* correction } }

It turns out that get_PLSR_NA() renders a 1-column matrix, although a numeric vector is needed for the product t(QQ[[q]])*w[[q]] to work. As a result the function fails to assign values to Y[,q].

Note that the only cases in get_weights_nonmetric() where w[[q]] is _not_a numeric vector is precisely when missing_data(q) == T and specs$modes[q] is either PLSCORE, PLSCOW or B. In other words, whenever get_PLSR_NA() is called.

I found that converting w[[q]] to the right format (i.e. w[[q]] = t( get_PLSR_NA(Y=...) )[1,] was sufficient in my context (because I didn't check whether get_PLSR_NA was used in other contexts where a 1-column matrix might be expected).

However, it might be better to change the output format of get_PLSR_NA()instead.

Best regards, G

gastonstat / plspm

Handling of NAs and get_weights_nonmetric() for non-reflective blocks #2

OR

OR

compute Y[i,q] as the regr. coeff. of QQ[[q]][i,] on w[[q]]