Closed jharenza closed 1 year ago
I tried to use the GDSC query :
and I couldn't find those 3 samples.... Not sure they got deleted
@ewafula can you remove these 3 from Tcga hist and PR to dbt repo?
Damn, wrong ticket, I meant for GTeX. I'll delete that comment to reduce confusion
Currently working on a script that will liftover the gene symbols and then collapse on gene symbols. After that, someone else will have to sort/remove any entries that don't appear in our PBTA/KF sets.
perfect @migbro
@ewafula @chinwallaa , ok , I have written a tool that does as I describe above. It's a python script that just uses base packages. Where should I put it? Should I wrap it in a cwl tool?
@migbro i wonder if we should hold on that until further qc- we are releasing all methylation but there are some which suggest mis-id / sample swaps (while some may be real biology suggesting a different diagnosis). Or perhaps take only those whose classification do indeed match the diagnosis? Would need semi-manual curation though... maybe discuss with Adam?
@migbro, I think @jharenza cross posted here. The above message is meant for bixu ticket #1752.
As for your question above, @zhangb1, who generates the OPC gene expression matrices can provide more details. I am assuming the tool goes to cavatica for @zhangb1's team to utilize for generating v12 TCGA/GTEx expression matrices.
@zhangb1?
@ewafula @chinwallaa was the liftover successful? If so, I'd like to close this and the GTeX tickets
@migbro, yes. Already using results with downstream modules. Thank you 🙏
Success! Closing
What data file(s) does this issue pertain to?
tcga-gene-expression-rsem-tpm-collapsed.rds
currently located hereWhat release are you using?
pre-v12 file
Put your question or report your issue here.
TCGA expression is currently on GENCODE v36, yet still 23,242 gene symbols are contained in TCGA matrix not in the v39 expression matrix and 20,979 gene symbols are in v39 expression matrix not in TCGA matrix. We need to remap these ENSG symbols from v36 to v39 - can be done one of two ways:
In addition, I am seeing 712 samples in TCGA merged matrix not in the histologies file (this is OK - maybe @ewafula could not find clinical info), but there are 3 samples in the histologies file not in the merged matrix:
@zhangb1 if we don't have data for these, can you let me know - we can remove from histologies - @ewafula can prepare a new TCGA file without them.
Who will complete this task?
@migbro @zhangb1 can you take point on this to determine the best path forward?
cc @chinwallaa @taylordm