biocore / DEICODE

Robust Aitchison PCA from sparse count data
Other
33 stars 17 forks source link

bug fixes in ordering and fraction total variance explained #54

Closed cameronmartino closed 4 years ago

cameronmartino commented 4 years ago

There are two main bug fixes in this PR which address the issue(s) #53 and #52. For issue #52 the bug was in the sorting of the singular value in the OptSpace function itself. This was sometimes corrected by the SVD in the recentering and some times was not, thus the periodic axis switch raised in #52. Examples of this fix on simulated:

image

and real data:

image

For issue #53, which is a common complaint, the solution involved comparing the variance across the projection (i.e. U * s) and the original data. To calculate the total variance for the original data I first took the rclr and then the sum of the variance w.r.t. only the observed values. In practice, this seems to estimate the fraction of variance explained much better in partial-rank compared to a full-rank decomposition. Especially, when contrasted to the previous method of using the partial-rank singular value sum of squared. This can be seen on both simulated:

image

and real data:

image

All the analysis/figures for the simulated and real-data can be found here and here.

I also added a feature raised in #45 for feature presence frequency filtering. In practice, this seems better than sum filtering so downstream log-ratios w/o pseudo counts don't have large sample dropouts.

The changelog has been updated and the version has been bumped.

cameronmartino commented 4 years ago

@mortonjt I agree. I think that the rank estimation is a better idea (i.e. option one). I added it as an option by setting --n-components to optspace and reverted the fraction variance explained. I included the option so if more methods for rank estimation are implemented later they can be chosen. I also did not change the default --n-components for this version from 3 to optspace to avoid user confusion. However, I did add a FutureWanring for hard set rank's saying that the next version of DEICODE will have optspace based rank estimation as the default.

mortonjt commented 4 years ago

Another option would be to have a completely new command (i.e. auto-rpca) that includes the rpca procedure but automatically estimates the rank. That way this will be backwards compatible.

mortonjt commented 4 years ago

Ahh ok. That makes sense. Ok, this is good to merge then

On Fri, Jan 10, 2020, 3:58 PM Cameron Martino notifications@github.com wrote:

@cameronmartino commented on this pull request.

In deicode/_rpca_defaults.py https://github.com/biocore/DEICODE/pull/54#discussion_r365427658:

DEFAULT_ITERATIONS = 5

-DESC_RANK = ("The underlying low-rank structure (suggested: 1 < rank < 10)"

  • " [minimum 2]") -DESC_MSC = "Minimum sum cutoff of sample across all features" -DESC_MFC = "Minimum sum cutoff of features across all samples" +DESC_RANK = ("The underlying low-rank structure."
  • " The input can be an integer "
  • "(suggested: 1 < rank < 10) [minimum 2]."
  • " Note: as the rank increases the runtime"
  • " will increase dramatically.") +DESC_MSC = ("Minimum sum cutoff of sample across all features"

Done, thanks!

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/biocore/DEICODE/pull/54?email_source=notifications&email_token=AA75VXM4VJHUQOUGKXOCMMLQ5DOQDA5CNFSM4KEAW6LKYY3PNVWWK3TUL52HS4DFWFIHK3DMKJSXC5LFON2FEZLWNFSXPKTDN5WW2ZLOORPWSZGOCRM2D7A#discussion_r365427658, or unsubscribe https://github.com/notifications/unsubscribe-auth/AA75VXOA6S2YTI2SCNCSSJTQ5DOQDANCNFSM4KEAW6LA .

cameronmartino commented 4 years ago

Thanks, @mortonjt! I addressed those typos and updated the README/tutorial(s) to be consistent with the new command. I also added a python API tutorial and standalone tutorial that follows the QIIME2 tutorial. This should address issue #40

cameronmartino commented 4 years ago

@mortonjt - good to merge! thanks.