Closed cameronmartino closed 4 years ago
@mortonjt I agree. I think that the rank estimation is a better idea (i.e. option one). I added it as an option by setting --n-components
to optspace
and reverted the fraction variance explained. I included the option so if more methods for rank estimation are implemented later they can be chosen. I also did not change the default --n-components
for this version from 3
to optspace
to avoid user confusion. However, I did add a FutureWanring
for hard set rank's saying that the next version of DEICODE will have optspace
based rank estimation as the default.
Another option would be to have a completely new command (i.e. auto-rpca
) that includes the rpca procedure but automatically estimates the rank. That way this will be backwards compatible.
Ahh ok. That makes sense. Ok, this is good to merge then
On Fri, Jan 10, 2020, 3:58 PM Cameron Martino notifications@github.com wrote:
@cameronmartino commented on this pull request.
In deicode/_rpca_defaults.py https://github.com/biocore/DEICODE/pull/54#discussion_r365427658:
DEFAULT_ITERATIONS = 5
-DESC_RANK = ("The underlying low-rank structure (suggested: 1 < rank < 10)"
- " [minimum 2]") -DESC_MSC = "Minimum sum cutoff of sample across all features" -DESC_MFC = "Minimum sum cutoff of features across all samples" +DESC_RANK = ("The underlying low-rank structure."
- " The input can be an integer "
- "(suggested: 1 < rank < 10) [minimum 2]."
- " Note: as the rank increases the runtime"
- " will increase dramatically.") +DESC_MSC = ("Minimum sum cutoff of sample across all features"
Done, thanks!
— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/biocore/DEICODE/pull/54?email_source=notifications&email_token=AA75VXM4VJHUQOUGKXOCMMLQ5DOQDA5CNFSM4KEAW6LKYY3PNVWWK3TUL52HS4DFWFIHK3DMKJSXC5LFON2FEZLWNFSXPKTDN5WW2ZLOORPWSZGOCRM2D7A#discussion_r365427658, or unsubscribe https://github.com/notifications/unsubscribe-auth/AA75VXOA6S2YTI2SCNCSSJTQ5DOQDANCNFSM4KEAW6LA .
Thanks, @mortonjt! I addressed those typos and updated the README/tutorial(s) to be consistent with the new command. I also added a python API tutorial and standalone tutorial that follows the QIIME2 tutorial. This should address issue #40
@mortonjt - good to merge! thanks.
There are two main bug fixes in this PR which address the issue(s) #53 and #52. For issue #52 the bug was in the sorting of the singular value in the OptSpace function itself. This was sometimes corrected by the SVD in the recentering and some times was not, thus the periodic axis switch raised in #52. Examples of this fix on simulated:
and real data:
For issue #53, which is a common complaint, the solution involved comparing the variance across the projection (i.e. U * s) and the original data. To calculate the total variance for the original data I first took the rclr and then the sum of the variance w.r.t. only the observed values. In practice, this seems to estimate the fraction of variance explained much better in partial-rank compared to a full-rank decomposition. Especially, when contrasted to the previous method of using the partial-rank singular value sum of squared. This can be seen on both simulated:
and real data:
All the analysis/figures for the simulated and real-data can be found here and here.
I also added a feature raised in #45 for feature presence frequency filtering. In practice, this seems better than sum filtering so downstream log-ratios w/o pseudo counts don't have large sample dropouts.
The changelog has been updated and the version has been bumped.