Closed syzheng closed 4 years ago
This sounds exciting! As I think about potential caveats, does it matter if the RNA-seq samples are poly-A selected or rRNA depleted?
This sounds exciting! As I think about potential caveats, does it matter if the RNA-seq samples are poly-A selected or rRNA depleted?
that is a great point. we currently use data from regular polyA enriched protocol, mostly because our primary input data is TCGA. this does impact the method, because a key gene in our signature, TERC, is a non-coding RNA that is not properly captured by polyA methods. PCR shows this gene is abundantly expressed across tissues; however, RNAseq data from TCGA and GTEx only show very low expression of this gene. EXTEND demonstrates reasonable performance with both TCGA, CCLE and GTEx, but We have not tested data from total RNAseq or rRNA depletion. Great point.
@syzheng : Ok! This dataset contains both poly-A and rRNA depleted samples. #120 and https://github.com/AlexsLemonade/OpenPBTA-analysis/tree/master/analyses/selection-strategy-comparison took a dive into the implications for gene expression analyses based on some earlier work by @cbethell.
I'm a bit confused by "a key gene in our signature, TERC, is a non-coding RNA that is not properly captured by polyA methods" and also "we currently use data from regular polyA enriched protocol, mostly because our primary input data is TCGA". Did you mean that you are better off with poly-A? There are many fewer poly-A samples here than rRNA-depleted.
As something that may be helpful in extending an analysis across both sets: @jharenza is looking to determine whether or not we can generate some that are matched (sequenced with both protocols).
@syzheng : Ok! This dataset contains both poly-A and rRNA depleted samples. #120 and https://github.com/AlexsLemonade/OpenPBTA-analysis/tree/master/analyses/selection-strategy-comparison took a dive into the implications for gene expression analyses based on some earlier work by @cbethell.
I'm a bit confused by "a key gene in our signature, TERC, is a non-coding RNA that is not properly captured by polyA methods" and also "we currently use data from regular polyA enriched protocol, mostly because our primary input data is TCGA". Did you mean that you are better off with poly-A? There are many fewer poly-A samples here than rRNA-depleted.
As something that may be helpful in extending an analysis across both sets: @jharenza is looking to determine whether or not we can generate some that are matched (sequenced with both protocols).
EXTEND was developed using data from polyA. We essentially do not know if it works for rRNA depletion, because we did not have this type of data when we benchmarked the method. The key is TERC, a non-coding RNA that constitutes our gene signature as well as the telomerase complex. It would be great if we have a few cases that are sequenced by both methods. Otherwise, we can examine the distribution of TERC expression in the dataset to see if they behave similarly to poly A datasets.
Gotcha! You'll find both sets of files in the data download as processed in a few different ways:
pbta-gene-expression-kallisto.polya.rds
pbta-gene-expression-kallisto.stranded.rds
pbta-gene-expression-rsem-fpkm.polya.rds
pbta-gene-expression-rsem-fpkm.stranded.rds
pbta-gene-counts-rsem-expected_count.polya.rds
pbta-gene-counts-rsem-expected_count.stranded.rds
For now, it will be interesting to see if the distribution is different and/or if TERC matches the estimates from the method in the stranded
ones. Hopefully we'll have the set with both in the not terribly distant future, but we shouldn't wait for them to get started. Thanks for taking this on!
Hi @syzheng ! Checking in on this analysis - do you have an idea of when you or your team would file a PR for this? Thanks!
yes, we have finished the score calculation. will update on github once we have more on integration. Siyuan
On Mon, Oct 28, 2019 at 7:41 AM Jo Lynne notifications@github.com wrote:
Hi @syzheng https://github.com/syzheng ! Checking in on this analysis - do you have an idea of when you or your team would file a PR for this? Thanks!
— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/AlexsLemonade/OpenPBTA-analysis/issues/148?email_source=notifications&email_token=ADIP5ZEHZFT75F6MX7XLX4DQQ3MWFA5CNFSM4I5UVFSKYY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOECMXNVA#issuecomment-546928340, or unsubscribe https://github.com/notifications/unsubscribe-auth/ADIP5ZBXCXDPFBHSVHZRR7DQQ3MWFANCNFSM4I5UVFSA .
Hi @syzheng! Wanted to update you that with V12 (#326) of the data release, we will provide stranded seq for 45 samples on which we also have polyA rna-seq, so would be interesting to determine whether there are telomerase prediction differences in these two sets of data. Stay tuned end of this week/early next week. Also looking forward to your PR!
Thanks for the heads up! I will update the group by next week. best,
On Wed, Dec 11, 2019 at 8:57 PM Jo Lynne notifications@github.com wrote:
Hi @syzheng https://github.com/syzheng! Wanted to update you that with V12 of the data release, we will provide stranded seq for 45 samples on which we also have polyA rna-seq, so would be interesting to determine whether there are telomerase prediction differences in these two sets of data. Stay tuned end of this week/early next week. Also looking forward to your PR!
— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/AlexsLemonade/OpenPBTA-analysis/issues/148?email_source=notifications&email_token=ADIP5ZAKDSELM3ZAIF5EAPTQYGSCXA5CNFSM4I5UVFSKYY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOEGVJOIY#issuecomment-564827939, or unsubscribe https://github.com/notifications/unsubscribe-auth/ADIP5ZGE4SFAAGIAL6TLSETQYGSCXANCNFSM4I5UVFSA .
Hi @syzheng ! Happy New Year! Do you think your team will be able to submit a PR on this analysis sometime soon? We are starting to wrap up/finalize analyses and determine manuscript figures. Thanks!
sorry. yes, I will make sure to complete it next few days.
On Fri, Jan 3, 2020 at 6:36 PM Jo Lynne notifications@github.com wrote:
Hi @syzheng https://github.com/syzheng ! Happy New Year! Do you think your team will be able to submit a PR on this analysis sometime soon? We are starting to wrap up/finalize analyses and determine manuscript figures. Thanks!
— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/AlexsLemonade/OpenPBTA-analysis/issues/148?email_source=notifications&email_token=ADIP5ZGPN6NRVIPI2H44PG3Q37KYDA5CNFSM4I5UVFSKYY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOEICMUHQ#issuecomment-570739230, or unsubscribe https://github.com/notifications/unsubscribe-auth/ADIP5ZCWNPU7JQRVL46G5L3Q37KYDANCNFSM4I5UVFSA .
No worries, glad to hear!
Addressed through #494, #506, #511, and #516
Scientific goals
The goal is to quantify telomerase activity and correlate them with telomere length and molecular alterations (TERTp mutation, ATRX mutation, etc)
Proposed methods
We will use our newly developed method EXTEND (EXpression based Telomerase ENzymatic activity Detection)
Required input data
Gene expression from RNAseq (either of TPM, RPKM, or counts)
Proposed timeline
One to two weeks.
Relevant literature
Barthel et al. Nat Genet, 2017; Zheng et al. Cancer Cell, 2016; Ackermann et al. Science, 2016