AlexsLemonade / OpenPBTA-analysis

The analysis repository for the Open Pediatric Brain Tumor Atlas Project
Other
99 stars 66 forks source link

Proposed Analysis: quantify telomerase activity across pediatric brain tumors #148

Closed syzheng closed 4 years ago

syzheng commented 4 years ago

Scientific goals

The goal is to quantify telomerase activity and correlate them with telomere length and molecular alterations (TERTp mutation, ATRX mutation, etc)

Proposed methods

We will use our newly developed method EXTEND (EXpression based Telomerase ENzymatic activity Detection)

Required input data

Gene expression from RNAseq (either of TPM, RPKM, or counts)

Proposed timeline

One to two weeks.

Relevant literature

Barthel et al. Nat Genet, 2017; Zheng et al. Cancer Cell, 2016; Ackermann et al. Science, 2016

cgreene commented 4 years ago

This sounds exciting! As I think about potential caveats, does it matter if the RNA-seq samples are poly-A selected or rRNA depleted?

syzheng commented 4 years ago

This sounds exciting! As I think about potential caveats, does it matter if the RNA-seq samples are poly-A selected or rRNA depleted?

that is a great point. we currently use data from regular polyA enriched protocol, mostly because our primary input data is TCGA. this does impact the method, because a key gene in our signature, TERC, is a non-coding RNA that is not properly captured by polyA methods. PCR shows this gene is abundantly expressed across tissues; however, RNAseq data from TCGA and GTEx only show very low expression of this gene. EXTEND demonstrates reasonable performance with both TCGA, CCLE and GTEx, but We have not tested data from total RNAseq or rRNA depletion. Great point.

cgreene commented 4 years ago

@syzheng : Ok! This dataset contains both poly-A and rRNA depleted samples. #120 and https://github.com/AlexsLemonade/OpenPBTA-analysis/tree/master/analyses/selection-strategy-comparison took a dive into the implications for gene expression analyses based on some earlier work by @cbethell.

I'm a bit confused by "a key gene in our signature, TERC, is a non-coding RNA that is not properly captured by polyA methods" and also "we currently use data from regular polyA enriched protocol, mostly because our primary input data is TCGA". Did you mean that you are better off with poly-A? There are many fewer poly-A samples here than rRNA-depleted.

As something that may be helpful in extending an analysis across both sets: @jharenza is looking to determine whether or not we can generate some that are matched (sequenced with both protocols).

syzheng commented 4 years ago

@syzheng : Ok! This dataset contains both poly-A and rRNA depleted samples. #120 and https://github.com/AlexsLemonade/OpenPBTA-analysis/tree/master/analyses/selection-strategy-comparison took a dive into the implications for gene expression analyses based on some earlier work by @cbethell.

I'm a bit confused by "a key gene in our signature, TERC, is a non-coding RNA that is not properly captured by polyA methods" and also "we currently use data from regular polyA enriched protocol, mostly because our primary input data is TCGA". Did you mean that you are better off with poly-A? There are many fewer poly-A samples here than rRNA-depleted.

As something that may be helpful in extending an analysis across both sets: @jharenza is looking to determine whether or not we can generate some that are matched (sequenced with both protocols).

EXTEND was developed using data from polyA. We essentially do not know if it works for rRNA depletion, because we did not have this type of data when we benchmarked the method. The key is TERC, a non-coding RNA that constitutes our gene signature as well as the telomerase complex. It would be great if we have a few cases that are sequenced by both methods. Otherwise, we can examine the distribution of TERC expression in the dataset to see if they behave similarly to poly A datasets.

cgreene commented 4 years ago

Gotcha! You'll find both sets of files in the data download as processed in a few different ways:

pbta-gene-expression-kallisto.polya.rds
pbta-gene-expression-kallisto.stranded.rds
pbta-gene-expression-rsem-fpkm.polya.rds
pbta-gene-expression-rsem-fpkm.stranded.rds
pbta-gene-counts-rsem-expected_count.polya.rds
pbta-gene-counts-rsem-expected_count.stranded.rds

For now, it will be interesting to see if the distribution is different and/or if TERC matches the estimates from the method in the stranded ones. Hopefully we'll have the set with both in the not terribly distant future, but we shouldn't wait for them to get started. Thanks for taking this on!

jharenza commented 4 years ago

Hi @syzheng ! Checking in on this analysis - do you have an idea of when you or your team would file a PR for this? Thanks!

syzheng commented 4 years ago

yes, we have finished the score calculation. will update on github once we have more on integration. Siyuan

On Mon, Oct 28, 2019 at 7:41 AM Jo Lynne notifications@github.com wrote:

Hi @syzheng https://github.com/syzheng ! Checking in on this analysis - do you have an idea of when you or your team would file a PR for this? Thanks!

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/AlexsLemonade/OpenPBTA-analysis/issues/148?email_source=notifications&email_token=ADIP5ZEHZFT75F6MX7XLX4DQQ3MWFA5CNFSM4I5UVFSKYY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOECMXNVA#issuecomment-546928340, or unsubscribe https://github.com/notifications/unsubscribe-auth/ADIP5ZBXCXDPFBHSVHZRR7DQQ3MWFANCNFSM4I5UVFSA .

jharenza commented 4 years ago

Hi @syzheng! Wanted to update you that with V12 (#326) of the data release, we will provide stranded seq for 45 samples on which we also have polyA rna-seq, so would be interesting to determine whether there are telomerase prediction differences in these two sets of data. Stay tuned end of this week/early next week. Also looking forward to your PR!

syzheng commented 4 years ago

Thanks for the heads up! I will update the group by next week. best,

On Wed, Dec 11, 2019 at 8:57 PM Jo Lynne notifications@github.com wrote:

Hi @syzheng https://github.com/syzheng! Wanted to update you that with V12 of the data release, we will provide stranded seq for 45 samples on which we also have polyA rna-seq, so would be interesting to determine whether there are telomerase prediction differences in these two sets of data. Stay tuned end of this week/early next week. Also looking forward to your PR!

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/AlexsLemonade/OpenPBTA-analysis/issues/148?email_source=notifications&email_token=ADIP5ZAKDSELM3ZAIF5EAPTQYGSCXA5CNFSM4I5UVFSKYY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOEGVJOIY#issuecomment-564827939, or unsubscribe https://github.com/notifications/unsubscribe-auth/ADIP5ZGE4SFAAGIAL6TLSETQYGSCXANCNFSM4I5UVFSA .

jharenza commented 4 years ago

Hi @syzheng ! Happy New Year! Do you think your team will be able to submit a PR on this analysis sometime soon? We are starting to wrap up/finalize analyses and determine manuscript figures. Thanks!

syzheng commented 4 years ago

sorry. yes, I will make sure to complete it next few days.

On Fri, Jan 3, 2020 at 6:36 PM Jo Lynne notifications@github.com wrote:

Hi @syzheng https://github.com/syzheng ! Happy New Year! Do you think your team will be able to submit a PR on this analysis sometime soon? We are starting to wrap up/finalize analyses and determine manuscript figures. Thanks!

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/AlexsLemonade/OpenPBTA-analysis/issues/148?email_source=notifications&email_token=ADIP5ZGPN6NRVIPI2H44PG3Q37KYDA5CNFSM4I5UVFSKYY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOEICMUHQ#issuecomment-570739230, or unsubscribe https://github.com/notifications/unsubscribe-auth/ADIP5ZCWNPU7JQRVL46G5L3Q37KYDANCNFSM4I5UVFSA .

jharenza commented 4 years ago

No worries, glad to hear!

jaclyn-taroni commented 4 years ago

Addressed through #494, #506, #511, and #516