gdkrmr / dimRed

A Framework for Dimensionality Reduction in R
https://www.guido-kraemer.com/software/dimred/
GNU General Public License v3.0
73 stars 15 forks source link

NNLM got removed from CRAN #23

Closed gdkrmr closed 6 years ago

gdkrmr commented 6 years ago

It seems that NNLM is unmaintained and got removed from CRAN (@topepo ):

https://cran.r-project.org/web/packages/NNLM/index.html

Questions: 1) What to do about the immediate problem, I guess I will get an email from the CRAN maintainers soon. 2) General robustness: Is there a way to make these dependencies optional and still pass CRAN tests?

topepo commented 6 years ago

Archived on 2018-05-03 as check problems were not corrected despite reminders.

That could mean a lot of different things. I'll try to contact the package maintainer and see what's going on.

In the last year, package maintainers have been dealing with spontaneous emails from one CRAN maintainer that threaten to orphan packages for minor issues. caret had to be updated simply because the help file path for the gam package changed. It would have been orphaned if I did not make this single character change to my documentation. An RStudio package was orphaned this year after a terrible series of events over a seemingly minor thing. I just received an email this week about 2 packages related to gcc 8 warnings.

I'd convert over to another package but I couldn't fined another one that could project the loadings on to new data points (but that doesn't man that another doesn't exist if you want to look around).

gdkrmr commented 6 years ago

ok, I haven't looked at the NNLM source code or what the actual problem with that package is.

Interestingly travis only fails for release (https://travis-ci.org/gdkrmr/dimRed/builds/368514661).

gdkrmr commented 6 years ago

https://github.com/linxihui/NNLM/issues/1, there was a recent commit on March 6 that fixed some warning.

topepo commented 6 years ago

It doesn't look like much is happening with NNLM. Do you know of another NNMF package that can do the projection for new samples?

gdkrmr commented 6 years ago

This is disappointing, the NNLM packages seemed to be really well made. I did only very limited research on this but I will look into it.

All depending packages are in SUGGESTS, I guess CRAN tests will not pass if some of them are missing. Is there a way to make packages optional on CRAN?

The current hold up for a new release of dimRed is (apart from this issue) a manuscript that I sent to the R-Journal quite a while ago and now takes its sweet time in review, I wanted to wait for this before adding new features to the CRAN version.

gdkrmr commented 6 years ago

As NNMF is just

X = YH

wouldn't a simple forward projection be:

X_new H' (HH')^-1 = Y_new

I have to check if this is the same thing that NNLM did.

gdkrmr commented 6 years ago

NFM seems like a pretty good choice it is also actively being maintained.

topepo commented 6 years ago

Agreed.

Besides ndim, what other arguments should be exposed at the top level? Of the nmf parameters, I would suggest method, seed, and objective. We could also add a general options list that can be pass by executing via do.call or similar.

Do you want another PR?

gdkrmr commented 6 years ago

If you want this done right now, make a pull request, I could do it myself as soon as I find some time. I usually try to expose all parameters with sensible defaults, in the case of NMF there really are a lot of parameters so I guess there could be some simplification. Exposing all parameters also won't hurt as long as they are optional and there are sensible defaults. It is easy to add more parameters later if we wish to do so, so I would just start with the ones you proposed, they seem to be the important ones for the actual results.

topepo commented 6 years ago

I get a lot done of flights so maybe I'll give this a try on Sunday. It's gating some work on the next book but that's not a huge deal. This chapter might be of interest to you.

topepo commented 6 years ago

I took a first pass at using NMF. The unit tests are not updated yet.

I find that I get better results using this package. It reruns the computations repeatedly and gives a consensus of the loadings so it is more stable. I think that there is an active effort to get NNLM back on CRAN but this might be a better option overall.

I do have one technical issue that might be related to dimRed though. Calling NMF::nmf via namespace is throwing odd errors. I've tried a variety of ways to call it:

Browse[1]> nmf_result <- do.call(NMF::nmf, args)
Error in path.package(pkg) : none of the packages are loaded
Timing stopped at: 0.025 0.001 0.026
Browse[1]> eval(quote(NMF::nmf(x = data, rank = 2, method = "brunet", seed = 2556, nrun = 10)))
Error in path.package(pkg) : none of the packages are loaded
Timing stopped at: 0.031 0.001 0.031
Browse[1]> nmf_result <- do.call("NMF::nmf", args)
Error in `NMF::nmf`(x = data, rank = 2L, method = "brunet", nrun = 3,  : 
  could not find function "NMF::nmf"

The namespace for NMF is attached. I'm not well versed in S4 and wonder if there is some underlying issue there. Suggestions are welcome.

gdkrmr commented 6 years ago

do you have a branch somewhere so that I can look at the code? did you add NMF to suggests?

topepo commented 6 years ago

Yes, here

gdkrmr commented 6 years ago

I get in a fresh R session where I load dimRed with devtools I get:

>  dat <- loadDataSet("Iris")
> 
>  set.seed(4646)
>  factorization <- embed(dat, "NNMF")
Error in path.package(pkg) : none of the packages are loaded
Timing stopped at: 0.004 0 0.002

> library(NMF)
Loading required package: pkgmaker
Loading required package: registry

Attaching package: ‘pkgmaker’

The following object is masked from ‘package:base’:

    isNamespaceLoaded

Loading required package: rngtools
Loading required package: cluster
NMF - BioConductor layer [OK] | Shared memory capabilities [NO: bigmemory] | Cores 3/4
  To enable shared memory capabilities, try: install.extras('
NMF
')
> 
>  
>  factorization <- embed(dat, "NNMF")

after loading NMF it works fine.

That is super strange and the first time I encounter an error like this. I am pretty sure that this has nothing to do with S4 Classes. Maybe it is because it uses pkgmaker, in the case of the DRR package (which is also written by me and depends on CVST) I could not use Suggests because CVST does some ancient namespace stuff which I could not get compatible. In the worst case we will have to put NNMF in Depends instead of Suggests

topepo commented 6 years ago

I had the same effect when it was formally loaded. If you are okay with Depends, that works for me.

gdkrmr commented 6 years ago

why do you quote data? https://github.com/topepo/dimRed/blob/develop/R/nnmf.R#L82

gdkrmr commented 6 years ago

put it into Depends, if we find a solution later, we still can move it to Suggests later.

topepo commented 6 years ago

It's a habit; if you end up using do.call, using the data itself as the argument value can embed the entire set of data values into any match.call object that gets saved. I'm getting in the habit of creating call expressions and then using eval on them to call the function. It's a tidyverse-related illness :-)

gdkrmr commented 6 years ago

From a fresh R session the following works:

> do.call(NMF::nmf, list(x = t(iris[1:4]), rank = 2))
<Object of class: NMFfit>
 # Model:
  <Object of class:NMFstd>
  features: 4 
  basis/rank: 2 
  samples: 150 
 # Details:
  algorithm:  brunet 
  seed:  random 
  RNG: 403L, 1L, ..., -2141195958L [ab081a44b49b315434185d3c43543769]
  distance metric:  'KL' 
  residuals:  3.085502 
  Iterations: 450 
  Timing:
     user  system elapsed 
    0.212   0.004   0.215 
gdkrmr commented 6 years ago

Doing some research, the error happens here: https://github.com/renozao/NMF/blob/e46bec199191766b863e221db79288cf4eb61028/R/parallel.R#L1054 I haven't figured out how to set verbose = TRUE in that function.

topepo commented 6 years ago

So I've finished the changes with the exception of depends/suggests. My tests don't pass locally since, despite heroic efforts, I can't get pcaL1 to install on OS X. The NMF tests pass fine.

Also, I'm starting to doubt that the inverse exists for this method. I had something implemented but removed it since it didn't work.

gdkrmr commented 6 years ago

Great, I will take a look at it as soon as I can!

Maybe there should be a way to pass tests with a Warning or Note if not all packages are installed. since X = YH the inverse should simply be Y_new H

topepo commented 6 years ago

Something like this is what I had but it doesn't convert back to the original values (but values that are fairly correlated with them). That would be approximation error.

> library(dimRed)
> 
> dat <- loadDataSet("Iris")
> 
> # Do a complete factorization
> emb <- embed(dat, "NNMF", seed = 13, ndim = 4)
> 
> # 4 x 4
> coefs <- emb@other.data$proj
> 
> nn_dat <- predict(emb, iris[1:7, 1:4])@data
> # 7 x 4
> 
> nn_dat %*% coefs
  Sepal.Length Sepal.Width Petal.Length Petal.Width
1     1516.504    928.5767     642.4568    140.0946
2     1420.053    860.8795     615.2740    136.6124
3     1395.576    853.7324     593.3802    129.9467
4     1379.314    838.2809     607.1650    138.4540
5     1508.000    926.2275     637.4357    139.2300
6     1646.097   1006.4699     720.0885    164.6022
7     1407.007    863.0821     607.5957    136.9446
> iris[1:7, 1:4]
  Sepal.Length Sepal.Width Petal.Length Petal.Width
1          5.1         3.5          1.4         0.2
2          4.9         3.0          1.4         0.2
3          4.7         3.2          1.3         0.2
4          4.6         3.1          1.5         0.2
5          5.0         3.6          1.4         0.2
6          5.4         3.9          1.7         0.4
7          4.6         3.4          1.4         0.3
gdkrmr commented 6 years ago

fyi, I opened an issue in the NMF repository: renozao/NMF/issues/114

gdkrmr commented 6 years ago

The following works for me:

library(NMF)
iris_nmf <- nmf(t(iris[1:4]), rank = 4)
W <- iris_nmf@fit@W # == basis(iris_nmf)
H <- iris_nmf@fit@H # == coef(iris_nmf)
# The backward calculation is WH
mean(t(iris[1:4]) - (W %*% H)) # should be close to 0
# The forward calculation is solving this X = WH for H
mean(H - (solve(t(W) %*% (W), t(as.matrix(iris[1:4]) %*% W)))) # should be close to zero
gdkrmr commented 6 years ago

Looking at the documentation of ?nmf I see one possible issue that nmf is not type stable and that the type depends on nruns and method. Looking through the possible outcome types the number of methods has to be restricted to 1.

topepo commented 6 years ago

I've used it on several data sets and find it to be numerically superior to NNLM and more reproducible (enough to get over the general weirdness of the package api). I did a quick test and found, for my data sets, 10 runs was enough to get reasonable repeatability.

gdkrmr commented 6 years ago

I did a pull request to your develop branch fixing some stuff, topepo/dimRed/pull/1

gdkrmr commented 6 years ago

done

gdkrmr commented 6 years ago

@topepo thanks for initiating this, NNMF is a very useful addition to dimRed!