Closed arcolombo closed 8 years ago
this shows the functoin calcTPM versus the generic accessor tpm(), so I'm not sure how @ttriche hit an error. collapseTpm() uses tpm accessor and this plot shows its identical. I'm suspecting it was a version clash.? needs some data to reproduce @triche's issue....
calcTpm is used to feed the matrix that tpm() pulls from, so those will always be identical. At some point there was a regression propagated from HEAD (RamsinghLab/arkas) into my fork, which had not previously been there, and had thrown off the calculations. It turns out that leaving the TPM calculation to the last minute (i.e., within the KallistoExperiment() constructor) is faster and less error-prone, as is carving out the calculation into its own little function that handles matrices and vectors (row/column matrices) directly. Hence calcTpm.
I spot-checked this on TARGET AML samples locally and I get no differences greater than 0.0001 TPM, which is well within the margin of error due to rounding/truncation. calcTpm also has the benefit of corresponding exactly (both variable names and logic) to that which Kallisto performs internally.
--t
On Mon, Jul 11, 2016 at 2:22 PM, Anthony R. Colombo < notifications@github.com> wrote:
Tpm-Err.pdf https://github.com/RamsinghLab/arkas/files/358108/Tpm-Err.pdf
this shows the functoin calcTPM versus the generic accessor tpm(), so I'm not sure how @ttriche https://github.com/ttriche hit an error. collapseTpm() uses tpm accessor and this plot shows its identical. I'm suspecting it was a version clash.? needs some data to reproduce @triche https://github.com/triche's issue....
— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/RamsinghLab/arkas/issues/17#issuecomment-231869281, or mute the thread https://github.com/notifications/unsubscribe/AAARIspIK7auJHXAT9wr847oaCQSIhw4ks5qUrQNgaJpZM4JJt7g .
so is this issue resolved? on my end I just used collapseTpm (yes it is a bit slow)...
uh... collapseTpm should be using rowsum() to collapse... if it isn't, then it will be slow. I worry that a lot of changes from my branch seem to have disappeared.
collapseTpm() is for gene-level or bundle-level (e.g. shared-ish TSS) analyses; it has little or nothing to do with the problem at hand. The issue was that when I ran a few hundred samples into arkas from h5 files, the version in HEAD was giving me nonsensical values that did not match up to the values from the .TSV file for the same transcripts in the same sample.
Here is the test I added to the reload script:
load(file="~/TARGET/kallisto/FDFT1_txs.rda") # txs for testing PASJEJ <- targetRnaSeq[FDFT1_txs, "PASJEJ"] PASJEJ_FDFT1 <- data.frame(eff_length=eff_length(PASJEJ), est_counts=counts(PASJEJ), tpm=tpm(PASJEJ)) load(file="~/TARGET/kallisto/FDFT1_PASJEJ.rda") # tpms for testing tpm_diffs <- (PASJEJ_FDFT1 - FDFT1_PASJEJ)[,3] if(any(tpm_diffs > 0.1)) stop("TPM calculations are wrong. Aborting.") saveRDS(targetRnaSeq, file="~/TARGET/kallisto/targetRnaSeq.rds")
With the fixes I added to my pull request, it passes (for any subject).
--t
On Mon, Jul 11, 2016 at 3:14 PM, Anthony R. Colombo < notifications@github.com> wrote:
so is this issue resolved? on my end I just used collapseTpm (yes it is a bit slow)...
— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/RamsinghLab/arkas/issues/17#issuecomment-231882006, or mute the thread https://github.com/notifications/unsubscribe/AAARIup03jSAJAiQtPah0oelSZ-GG1aGks5qUsA6gaJpZM4JJt7g .
okay, your recent PR has been merged.
On Tue, Jul 12, 2016 at 12:01 PM, Tim Triche, Jr. notifications@github.com wrote:
uh... collapseTpm should be using rowsum() to collapse... if it isn't, then it will be slow. I worry that a lot of changes from my branch seem to have disappeared.
collapseTpm() is for gene-level or bundle-level (e.g. shared-ish TSS) analyses; it has little or nothing to do with the problem at hand. The issue was that when I ran a few hundred samples into arkas from h5 files, the version in HEAD was giving me nonsensical values that did not match up to the values from the .TSV file for the same transcripts in the same sample.
Here is the test I added to the reload script:
double-check that the TPM fix did not somehow get mangled (again)
load(file="~/TARGET/kallisto/FDFT1_txs.rda") # txs for testing PASJEJ <- targetRnaSeq[FDFT1_txs, "PASJEJ"] PASJEJ_FDFT1 <- data.frame(eff_length=eff_length(PASJEJ), est_counts=counts(PASJEJ), tpm=tpm(PASJEJ)) load(file="~/TARGET/kallisto/FDFT1_PASJEJ.rda") # tpms for testing tpm_diffs <- (PASJEJ_FDFT1 - FDFT1_PASJEJ)[,3] if(any(tpm_diffs > 0.1)) stop("TPM calculations are wrong. Aborting.") saveRDS(targetRnaSeq, file="~/TARGET/kallisto/targetRnaSeq.rds")
With the fixes I added to my pull request, it passes (for any subject).
--t
On Mon, Jul 11, 2016 at 3:14 PM, Anthony R. Colombo < notifications@github.com> wrote:
so is this issue resolved? on my end I just used collapseTpm (yes it is a bit slow)...
— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/RamsinghLab/arkas/issues/17#issuecomment-231882006, or mute the thread < https://github.com/notifications/unsubscribe/AAARIup03jSAJAiQtPah0oelSZ-GG1aGks5qUsA6gaJpZM4JJt7g
.
— You are receiving this because you modified the open/close state. Reply to this email directly, view it on GitHub https://github.com/RamsinghLab/arkas/issues/17#issuecomment-232146101, or mute the thread https://github.com/notifications/unsubscribe/AFODS7B6nlUjhqAoLgCSnp84Qg6H86pWks5qU-SXgaJpZM4JJt7g .
the collapseTPM uses an lapply and a colSums to group the non-empty tpm entries that have a sufficient feature data for the bundleID. then it extracts via tpm() which is what Arkas data sets have using.
the issue was _only_ with the results from calcTPM? if so, then we can close this without revisiting previous data.... etc
AC
this issue can be closed, performance of collapseTPM is a separate issue
--t
On Tue, Jul 12, 2016 at 4:35 PM, Anthony R. Colombo < notifications@github.com> wrote:
the collapseTPM uses an lapply and a colSums to group the non-empty tpm entries that have a sufficient feature data for the bundleID. then it extracts via tpm() which is what Arkas data sets have using.
the issue was only with the results from calcTPM? if so, then we can close this without revisiting previous data.... etc
AC
— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/RamsinghLab/arkas/issues/17#issuecomment-232213332, or mute the thread https://github.com/notifications/unsubscribe/AAARIhowoMcmTbqMRtPYaC1kjH4gdE0Tks5qVCTPgaJpZM4JJt7g .
ok tyvm.
On Tue, Jul 12, 2016 at 4:50 PM, Tim Triche, Jr. notifications@github.com wrote:
this issue can be closed, performance of collapseTPM is a separate issue
--t
On Tue, Jul 12, 2016 at 4:35 PM, Anthony R. Colombo < notifications@github.com> wrote:
the collapseTPM uses an lapply and a colSums to group the non-empty tpm entries that have a sufficient feature data for the bundleID. then it extracts via tpm() which is what Arkas data sets have using.
the issue was only with the results from calcTPM? if so, then we can close this without revisiting previous data.... etc
AC
— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/RamsinghLab/arkas/issues/17#issuecomment-232213332, or mute the thread < https://github.com/notifications/unsubscribe/AAARIhowoMcmTbqMRtPYaC1kjH4gdE0Tks5qVCTPgaJpZM4JJt7g
.
— You are receiving this because you modified the open/close state. Reply to this email directly, view it on GitHub https://github.com/RamsinghLab/arkas/issues/17#issuecomment-232215673, or mute the thread https://github.com/notifications/unsubscribe/AFODS6nEirjJZ8EATmEGDQlJQ6LYkxmpks5qVChEgaJpZM4JJt7g .
need to rwite a test for calcTPM to ensure that the tpm calculation is correct with respect to kallisto versions.