anujkhare / iregnet

7 stars 12 forks source link

Test iregnet converg #77

Open andruuhurst opened 4 years ago

andruuhurst commented 4 years ago

Adding a test case where cv.iregnet/iregnet function fails to converge

example when cv.iregnet or iregnet called on H3K36me3_TDH_immune.Rdata

fit <- with( H3K36me3_TDH_immune , iregnet::iregnet( inputs , outputs))

returned error:

In iregner(x , y, family = family , unreg_sol = FALSE, ...) :
Failed to converge. Try again after adding more data

same error with added data such as:

fit <- cv.iregnet( X , y , family = "gaussian" , scale_init = NA , estimate_scale= TRUE, debug=1)
tdhock commented 4 years ago

maybe fix?

c:/Rtools/mingw_64/bin/g++  -I"C:/PROGRA~1/R/R-36~1.1/include" -DNDEBUG  -I"C:/Users/th798/R/win-library/3.6/Rcpp/include"        -O2 -Wall  -mtune=generic -c distributions.cpp -o distributions.o
distributions.cpp: In function 'double compute_grad_response(double*, double*, double*, const double*, const double*, const double*, double, const IREG_CENSORING*, long long unsigned int, IREG_DIST, double*, bool)':
distributions.cpp:69:9: warning: enumeration value 'IREG_DIST_LOG_GAUSSIAN' not handled in switch [-Wswitch]
   switch(dist) {
         ^
distributions.cpp:69:9: warning: enumeration value 'IREG_DIST_LOG_LOGISTIC' not handled in switch [-Wswitch]
distributions.cpp:69:9: warning: enumeration value 'IREG_DIST_EXPONENTIAL' not handled in switch [-Wswitch]
distributions.cpp:69:9: warning: enumeration value 'IREG_DIST_WEIBULL' not handled in switch [-Wswitch]
distributions.cpp:69:9: warning: enumeration value 'IREG_DIST_UNKNOWN' not handled in switch [-Wswitch]
distributions.cpp: In function 'Rcpp::List iregnet_compute_gradients(Rcpp::NumericMatrix, Rcpp::NumericMatrix, Rcpp::NumericVector, double, Rcpp::String)':
distributions.cpp:426:21: warning: comparison between signed and unsigned integer expressions [-Wsign-compare]
   for (ull j = 0; j < n_vars; ++j) {
                     ^
distributions.cpp:428:23: warning: comparison between signed and unsigned integer expressions [-Wsign-compare]
     for (ull i = 0; i < n_obs; ++i) {
                       ^
distributions.cpp: In function 'double compute_grad_response(double*, double*, double*, const double*, const double*, const double*, double, const IREG_CENSORING*, long long unsigned int, IREG_DIST, double*, bool)':
distributions.cpp:229:25: warning: 'ddsig' may be used uninitialized in this function [-Wmaybe-uninitialized]
       ddsig_sum += ddsig;
                         ^
distributions.cpp:228:23: warning: 'dsig' may be used uninitialized in this function [-Wmaybe-uninitialized]
       dsig_sum += dsig;
                       ^
distributions.cpp:219:26: warning: 'ddg' may be used uninitialized in this function [-Wmaybe-uninitialized]
     if (dsig == 0 || ddg == 0)
                          ^
distributions.cpp:222:30: warning: 'dg' may be used uninitialized in this function [-Wmaybe-uninitialized]
       response = eta[i] - dg / ddg;
                              ^
c:/Rtools/mingw_64/bin/g++  -I"C:/PROGRA~1/R/R-36~1.1/include" -DNDEBUG  -I"C:/Users/th798/R/win-library/3.6/Rcpp/include"        -O2 -Wall  -mtune=generic -c iregnet_fit.cpp -o iregnet_fit.o
iregnet_fit.cpp: In function 'Rcpp::List fit_cpp(Rcpp::NumericMatrix, Rcpp::NumericMatrix, Rcpp::String, Rcpp::NumericVector, int, Rcpp::IntegerVector, bool, double, double, bool, bool, bool, double, double, int, double)':
iregnet_fit.cpp:150:22: warning: comparison between signed and unsigned integer expressions [-Wsign-compare]
     for(ull i = 0; i < num_lambda; ++i) {
                      ^
iregnet_fit.cpp:351:21: warning: comparison between signed and unsigned integer expressions [-Wsign-compare]
   for (ull m = 0; m <= end_ind; ++m) {
                     ^
iregnet_fit.cpp:225:10: warning: variable 'lambda_max_unscaled' set but not used [-Wunused-but-set-variable]
   double lambda_max_unscaled; // Check use
          ^
iregnet_fit.cpp: In function 'double get_y_means(Rcpp::NumericMatrix&, IREG_CENSORING*, double*)':
iregnet_fit.cpp:394:21: warning: comparison between signed and unsigned integer expressions [-Wsign-compare]
   for (ull i = 0; i < y.nrow(); ++i) {
                     ^
iregnet_fit.cpp: In function 'void get_censoring_types(Rcpp::NumericMatrix&, IREG_CENSORING*)':
iregnet_fit.cpp:419:21: warning: comparison between signed and unsigned integer expressions [-Wsign-compare]
   for (ull i = 0; i < y.nrow(); ++i) {
                     ^
iregnet_fit.cpp: In function 'void standardize_x(Rcpp::NumericMatrix&, double*, double*, bool)':
iregnet_fit.cpp:479:34: warning: comparison between signed and unsigned integer expressions [-Wsign-compare]
   for (ull i = int(intercept); i < X.ncol(); ++i) {  // don't standardize intercept col.
                                  ^
iregnet_fit.cpp:481:23: warning: comparison between signed and unsigned integer expressions [-Wsign-compare]
     for (ull j = 0; j < X.nrow(); ++j) {
                       ^
iregnet_fit.cpp:486:23: warning: comparison between signed and unsigned integer expressions [-Wsign-compare]
     for (ull j = 0; j < X.nrow(); ++j) {
                       ^
iregnet_fit.cpp:494:23: warning: comparison between signed and unsigned integer expressions [-Wsign-compare]
     for (ull j = 0; j < X.nrow(); ++j) {
                       ^
iregnet_fit.cpp: In function 'void standardize_y(Rcpp::NumericMatrix&, double*, double&)':
iregnet_fit.cpp:503:21: warning: comparison between signed and unsigned integer expressions [-Wsign-compare]
   for (ull i = 0; i < y.nrow() * y.ncol(); ++i) {
                     ^
iregnet_fit.cpp:506:21: warning: comparison between signed and unsigned integer expressions [-Wsign-compare]
   for (ull i = 0; i < y.nrow(); ++i)
                     ^
iregnet_fit.cpp: In function 'double get_init_var(double*, IREG_CENSORING*, long long unsigned int, IREG_DIST)':
iregnet_fit.cpp:515:21: warning: comparison between signed and unsigned integer expressions [-Wsign-compare]
   for (int i = 0; i < n; ++i) {
                     ^
iregnet_fit.cpp:520:21: warning: comparison between signed and unsigned integer expressions [-Wsign-compare]
   for (int i = 0; i < n; ++i) {
                     ^
iregnet_fit.cpp:525:10: warning: enumeration value 'IREG_DIST_GAUSSIAN' not handled in switch [-Wswitch]
   switch (dist) {
          ^
iregnet_fit.cpp:525:10: warning: enumeration value 'IREG_DIST_LOG_GAUSSIAN' not handled in switch [-Wswitch]
iregnet_fit.cpp:525:10: warning: enumeration value 'IREG_DIST_LOG_LOGISTIC' not handled in switch [-Wswitch]
iregnet_fit.cpp:525:10: warning: enumeration value 'IREG_DIST_EXPONENTIAL' not handled in switch [-Wswitch]
iregnet_fit.cpp:525:10: warning: enumeration value 'IREG_DIST_WEIBULL' not handled in switch [-Wswitch]
iregnet_fit.cpp:525:10: warning: enumeration value 'IREG_DIST_UNKNOWN' not handled in switch [-Wswitch]
tdhock commented 4 years ago

test_convergence.R?

tdhock commented 4 years ago

figure out why debug=1 not working? it shoudl print messages to the screen.

tdhock commented 4 years ago

to get the build working please remove SpatialExtremes, ElemStatLearn from DESCRIPTION because they are no longer on CRAN.

andruuhurst commented 4 years ago

@tdhock Fixed build error with cpp files

After fixing those warnings I tried to run on the H3K36me3_TDH_immune data set with this result:

>R --vanilla -e "data(H3K36me3_TDH_immune, package='iregnet');
with(H3K36me3_TDH_immune , iregnet::iregnet(inputs , outputs , debug = 1))"

….

Warning messages:
1: replacing previous import ‘Rcpp::prompt’ by ‘utils::prompt’ when loading ‘iregnet’
2: replacing previous import ‘Rcpp::.DollarNames’ by ‘utils::.DollarNames’ when loading ‘iregnet’
3: In evalq((function (..., call. = TRUE, immediate. = FALSE, noBreaks. = FALSE,  :
  get_init_var - Unsupported distribution provided: Gaussian
4: In iregnet::iregnet(inputs, outputs, debug = 1) :
  Failed to converge. Try again after adding more data.

Warning message I placed in IREG_DIST_GAUSSIAN switch case in get_init_var() function in the iregnet_fit.cpp: get_init_var - Unsupported distribution provided: Gaussian This leads me to believe that Gaussian not being supported contributes to the non-convergence. However, this doesn't explain why it does not converge for logistic distribution.

tdhock commented 4 years ago

I've never seen that warning before (get_init_var - Unsupported distribution provided: Gaussian). The gaussian distribution should be supported, so that is unusual. are you sure it is not lower-case "g" in gaussian?

tdhock commented 4 years ago

please rename data/*.Rdata (lower-case d, un-recognized by data function) to data/*.RData (upper-case D, recognized by data function)

tdhock commented 4 years ago

I see this error which suggests a typo "Guassian" -- please read the code and see what values of distribution are acceptable (case-sensitive)

tdhock@maude-MacBookPro:~/R/iregnet$ R --vanilla -e "data(H3K36me3_TDH_immune, package='iregnet');with(H3K36me3_TDH_immune , iregnet::iregnet(inputs , outputs , debug = 1))"

R version 3.6.3 (2020-02-29) -- "Holding the Windsock"
...
1: replacing previous import ‘Rcpp::prompt’ by ‘utils::prompt’ when loading ‘iregnet’ 
2: replacing previous import ‘Rcpp::.DollarNames’ by ‘utils::.DollarNames’ when loading ‘iregnet’ 
3: In evalq((function (..., call. = TRUE, immediate. = FALSE, noBreaks. = FALSE,  :
  get_init_var - Unsupported distribution provided: Guassian
4: In iregnet::iregnet(inputs, outputs, debug = 1) :
  Failed to converge. Try again after adding more data.
> 
> 
tdhock@maude-MacBookPro:~/R/iregnet$ 
andruuhurst commented 4 years ago

@tdhock

please rename data/*.Rdata (lower-case d, un-recognized by data function) to data/*.RData (upper-case D, recognized by data function)

The format of the 'data/*.RData' is fixed within the PR code committed

andruuhurst commented 4 years ago

I see this error which suggests a typo "Guassian" -- please read the code and see what values of distribution are acceptable (case-sensitive)

tdhock@maude-MacBookPro:~/R/iregnet$ R --vanilla -e "data(H3K36me3_TDH_immune, package='iregnet');with(H3K36me3_TDH_immune , iregnet::iregnet(inputs , outputs , debug = 1))"

R version 3.6.3 (2020-02-29) -- "Holding the Windsock"
...
1: replacing previous import ‘Rcpp::prompt’ by ‘utils::prompt’ when loading ‘iregnet’ 
2: replacing previous import ‘Rcpp::.DollarNames’ by ‘utils::.DollarNames’ when loading ‘iregnet’ 
3: In evalq((function (..., call. = TRUE, immediate. = FALSE, noBreaks. = FALSE,  :
  get_init_var - Unsupported distribution provided: Guassian
4: In iregnet::iregnet(inputs, outputs, debug = 1) :
  Failed to converge. Try again after adding more data.
> 
> 
tdhock@maude-MacBookPro:~/R/iregnet$ 

I fixed the unsupported switch statement build errors by filling the switch statements that weren't supported with the custom warnings. i.e. what you saw from your console. I misspelled gaussian, however, this is where the warning came from:

Within the iregnet_fit.cpp:

static double
get_init_var (double *ym, IREG_CENSORING *status, ull n, IREG_DIST dist)
{
.
.
.
  switch (dist) {
    case IREG_DIST_EXTREME_VALUE :
      mean = mean + 0.572;
      var = var / 1.64;
    break;

    case IREG_DIST_LOGISTIC:
      var = var / 3.2;
    break;

    // New code : fixes unsupported switch case errors
    case IREG_DIST_GAUSSIAN:{
      Rcpp::Function warning("warning");
      warning("get_init_var - Unsupported distribution provided: Gaussian");
      return var;
    }
.
.
.
tdhock commented 4 years ago

what is this code doing? when is it called? are you sure gaussian wasn't supported already? (it should be, it is the most natural distribution to use usually)

andruuhurst commented 4 years ago

@tdhock

what is this code doing? when is it called? are you sure gaussian wasn't supported already? (it should be, it is the most natural distribution to use usually)

I am working on figuring out what it is doing and when/why it is called.

I am sure it wasn't supported, when referring to the iregnet_fit.cpp file you can see this in the original source code:

Screen Shot 2020-04-09 at 10 14 22 PM Here you can see extreme and logistic only exists as cases within the switch statement

andruuhurst commented 4 years ago

From what I can tell the get_init_var() function is used to calculate the initial scale value within the fit_cpp() function when the scale_init parameter is passed as NULL

 /* SCALE RULES:
   * Whether or not it is estimated depends on estimate_scale. For exponential, this is forced to False and scale fixed to 1. // TODO
   *
   * If you provide no scale, a starting value will be calculated
   * If you provide a scale, it will be used as the initial value
   */
  if (scale_init == Rcpp::NA) {
    scale_init = get_init_var(ym, status, n_obs, transformed_dist);
    scale_init = 2 * sqrt(scale_init);    // use 2 * sqrt(var) as in survival
    // TODO: find corner cases where you need to take 2 * sqrt(var) as in survival
  }
  scale = scale_init; log_scale = log(scale);

what determines this value is the parameters of the get_init_var() function, the y mean values, censoring type, number of observations, and the distribution.

However, one thing I am confused about is why the censoring type is passed as an argument because it is never used within the function.

tdhock commented 4 years ago

Thanks for the update @andruuhurst. it looks to me like the normal/gaussian is the default case, and the switch statement is to handle to other two cases. so it seems like that should work.

If the censoring type is not used then please delete that argument to avoid confusion.

Even if that initial scale estimation does not work, can you try running the algorithm with a fixed scale parameter=1 to see if it works in that case?

tdhock commented 4 years ago

hi @andruuhurst what happened? does the fixed scale parameter always work? don't you want to add test cases for non-fixed scale parameters in this PR?

andruuhurst commented 4 years ago

@tdhock Sorry, I did not mean to close this PR but I was having trouble with a commit and I mistakenly created a detached head on this feature branch and wasn't able to resolve it. I ended up having to create a new feature branch and put my progress into that one. I believe I will have to make a new PR in that case to add any further progress

tdhock commented 4 years ago

it would be better to just keep everything in this PR/branch, so can you please pull changes from your other branch into this one? (git merge)

andruuhurst commented 4 years ago

@tdhock the problem is everytime I try to access the feature branch related to this PR remotely it gives me a detached head and doesn't allow me to make any commits.

tdhock commented 4 years ago

you can do it! read the git manual about git remote (list/add remote repos), git branch (list/create branches), git checkout (switch branch), and git merge (take all commits froms one branch to another).