d3b-center / ticket-tracker-OPC

A repo to generate and track tickets for ped OT
2 stars 0 forks source link

Add GTEX, TCGA, TARGET TPM files to S3 bucket #11

Closed jharenza closed 3 years ago

jharenza commented 3 years ago

What data file(s) does this issue pertain to?

We need to add RNA-Seq data for GTEX, TCGA, TARGET

What release are you using?

v1

Put your question or report your issue here.

Add RDS TPM files for GTEX, TCGA, and TARGET as well as manifests (sample ID, histology) to: s3://kf-openaccess-us-east-1-prd-pbta/open-targets/v2/

komalsrathi commented 3 years ago

@jharenza do we want collapsed matrices for these? Because I can get those easily (have some already), which might make things faster for batch correction as well. If you want the un-collapsed versions, it might take slightly more time because we will have to merge from raw files. But let me know whatever is the requirement, so I can start assembling now.

jharenza commented 3 years ago

Oh, yes, all collapsed.

komalsrathi commented 3 years ago

@jharenza One more question (so that these issues don't crop up later in the downstream analyses) - would we be in all of the below or just the Primary types?

# TCGA
> plyr::count(tcga_meta$definition)
                                                x freq
1                        Additional - New Primary   11
2                           Additional Metastatic    1
3                                      Metastatic  392
4 Primary Blood Derived Cancer - Peripheral Blood  172
5                             Primary Solid Tumor 9307
6                           Recurrent Solid Tumor   50
7                             Solid Tissue Normal  728

# TARGET
> plyr::count(target_meta$definition)
                                                         x freq
1       Blood Derived Cancer - Bone Marrow, Post-treatment   12
2  Blood Derived Cancer - Peripheral Blood, Post-treatment    1
3                                               Metastatic    1
4               Primary Blood Derived Cancer - Bone Marrow  672
5          Primary Blood Derived Cancer - Peripheral Blood  129
6                                      Primary Solid Tumor  449
7             Recurrent Blood Derived Cancer - Bone Marrow  108
8        Recurrent Blood Derived Cancer - Peripheral Blood    3
9                                    Recurrent Solid Tumor   13
10                                     Solid Tissue Normal   11
jharenza commented 3 years ago

All, thanks!

komalsrathi commented 3 years ago

Not sure if I don't have write access:

aws s3 --profile saml cp gtex-gene-expression-rsem-tpm-collapsed.polya.rds s3://kf-openaccess-us-east-1-prd-pbta/open-targets/v2/

upload failed: ./gtex-gene-expression-rsem-tpm-collapsed.polya.rds to s3://kf-openaccess-us-east-1-prd-pbta/open-targets/v2/gtex-gene-expression-rsem-tpm-collapsed.polya.rds An error occurred (AccessDenied) when calling the CreateMultipartUpload operation: Access Denied

cc: @yuankunzhu

jharenza commented 3 years ago

aws s3 --profile saml cp gtex-gene-expression-rsem-tpm-collapsed.polya.rds s3://kf-openaccess-us-east-1-prd-pbta/open-targets/v2/

I created this ticket for your access.

jharenza commented 3 years ago

Going to add this link to TARGET library types: https://ocg.cancer.gov/programs/target/target-methods and summary: ALL, AML, NBL, RT, WT are polyA (some may be stranded as well - post 2014, but not sure how to tell which samples these are) and ALAL, CCSK are stranded.

Note: I do not see any ALAL in our dataset (Acute Leukemia of Ambiguous Lineage).

jharenza commented 3 years ago

I uploaded these to the bucket