FertigLab / CoGAPS

Bayesian MCMC matrix factorization algorithm
https://www.bioconductor.org/packages/release/bioc/html/CoGAPS.html
BSD 3-Clause "New" or "Revised" License
66 stars 17 forks source link

123 test behavior with a matrix fixed #124

Closed dimalvovs closed 3 days ago

atuldeshpande commented 6 days ago

Few comments:

  1. The result of the fixed mode itself is great and matches quite well with the original CoGAPS result! That's awesome!
  2. Is reporting all zeros in the fixed pattern instead of the original matrix a UI choice? One positive of that user readily recognizes that this is a fixed mode output.
  3. The reported chi-sq and the final meanchisq in the cogaps results are meaningless. Moreover, the running chisq is different when A is fixed vs P is fixed in Gist example. When A is fixed, running chi sq changes every 1000 iterations and is approx 3400. When P is fixed, running chi sq is constant and =895072. The final mean chi sq in both modes is 895072. Could we either report an accurate chisq value or add that as an issue to tackle in the future? (Note: both the featureLoadinsg=fixedMatrix and meanchiSq calculation to be saved in the output could be done in R at the end if needed instead of in C++.)
dimalvovs commented 6 days ago

Few comments:

  1. The result of the fixed mode itself is great and matches quite well with the original CoGAPS result! That's awesome!

great!

  1. Is reporting all zeros in the fixed pattern instead of the original matrix a UI choice? One positive of that user readily recognizes that this is a fixed mode output.

indeed, this is a UI choice to recognize that this has been a fixed run. the user could always use their original fixed matrix to do calculations they require.

  1. The reported chi-sq and the final meanchisq in the cogaps results are meaningless. Moreover, the running chisq is different when A is fixed vs P is fixed in Gist example. When A is fixed, running chi sq changes every 1000 iterations and is approx 3400. When P is fixed, running chi sq is constant and =895072. The final mean chi sq in both modes is 895072. Could we either report an accurate chisq value or add that as an issue to tackle in the future? (Note: both the featureLoadinsg=fixedMatrix and meanchiSq calculation to be saved in the output could be done in R at the end if needed instead of in C++.)

indeed, this is unfortunate. added a test that confirms it in c4efe05 and we should fix that

favorov commented 6 days ago

Answer to: confirms it in c4efe05 and we should fix

Try to transpose the data matrix, so (A,P) to (mMatrix, *motherMatrix) swaps. Will it swap the chisquare behaviours?

dimalvovs commented 5 days ago

Answer to: confirms it in c4efe05 and we should fix

Try to transpose the data matrix, so (A,P) to (mMatrix, *motherMatrix) swaps. Will it swap the chisquare behaviours?

it does not

dimalvovs commented 5 days ago

The problem with reporting chisq during the run was that the chisq report was done from one of the two samplers. Not a problem in case two matrices are sampled in turn as samplers seem to get synced, but when sampling only one matrix, we need to to choose which sampler to report from, fixed in 23044a1

MeanChiSq issue is a bit more complex as in the current implementation it requires both Amean and Pmean to be present, which is not the case when one of them is fixed. Introducing another (5th already - there a 2 for Dense and 2 for sparse) chisq calculation in R seems wrong at this point.

dimalvovs commented 3 days ago

bypassing as an approving review is in the comments