SystemsGenetics / KINC

Knowledge Independent Network Construction
MIT License
11 stars 4 forks source link

Fixes for too many missing values in a cluster #153

Closed spficklin closed 4 years ago

spficklin commented 4 years ago

This is a fix for issue #141. If a cluster (edge) has so many missing values that there are too few samples left in the cluster then both the regression and the hypergeometric tests will throw a GSL error. This PR fixes both.

However, there may still be another problem as @JohnHadish posted was this:

gsl: init_source.c:29: ERROR: matrix dimension n1 must be positive integer

That error indicates a problem with the n1 variable which is part of the hypergeometric test, not regression, but it was my understanding that the column is quantitative not categorical and regression should have been used....

@JohnHadish I think this PR should fix the issue for both tests, but can you test it and make sure the output for quantitative columns is from regression testing and not categorical?

edit: This PR also fixes issue #140 as well. It uses the same --nan argument for consistency. The --missing argument is still supported so as not to break existing scripts, but is deprecated.

spficklin commented 4 years ago

@4ctrl-alt-del can you provide a code review?

spficklin commented 4 years ago

Whoops, this PR is merging into master rather than the develop branch. @JohnHadish and @4ctrl-alt-del once you test these and approve them, I will close it and then redo it so it merge into the develop branch.