NCI-CGR / plco-analysis

Primary workflow for the PLCO "Atlas" project
2 stars 3 forks source link

chrX support #11

Open lightning-auriga opened 3 years ago

lightning-auriga commented 3 years ago

Atlas investigators have requested chrX support in the pipeline. This is not too difficult but requires pulling in imputed files generated by someone else. Each downstream tool handles chrX differently, so support needs to be cooked into each individual association pipeline.

lightning-auriga commented 3 years ago

for whoever inherits this project: here are the locations of chrX imputations for PLCO as I've been informed by email:

/DCEG/CGF/Bioinformatics/Production/Shilpa/Projects/PLCO_chrX_Imputation/Oncoarray/IMPUTATION_1000G /DCEG/CGF/Bioinformatics/Production/Shilpa/Projects/PLCO_chrX_Imputation/Oncoarray/IMPUTATION_TOPMED

/DCEG/CGF/Bioinformatics/Production/Shilpa/Projects/PLCO_chrX_Imputation/OmniX/IMPUTATION_TOPMED

/DCEG/CGF/Bioinformatics/Production/Shilpa/Projects/PLCO_chrX_Imputation/Omni25M/IMPUTATION_1000G /DCEG/CGF/Bioinformatics/Production/Shilpa/Projects/PLCO_chrX_Imputation/Omni25M/IMPUTATION_TOPMED

/DCEG/CGF/Bioinformatics/Production/Shilpa/Projects/PLCO_chrX_Imputation/Omni5/IMPUTATION_1000G /DCEG/CGF/Bioinformatics/Production/Shilpa/Projects/PLCO_chrX_Imputation/Omni5/IMPUTATION_TOPMED

/DCEG/CGF/Bioinformatics/Production/Shilpa/Projects/PLCO_chrX_Imputation/GSA/IMPUTATION_1000G/batch1 (batch2,batch3,batch4,batch5) /DCEG/CGF/Bioinformatics/Production/Shilpa/Projects/PLCO_chrX_Imputation/GSA/IMPUTATION_TOPMED/batch1 (batch2,batch3,batch4,batch5)

lightning-auriga commented 3 years ago

assorted comments:

The above eventually just need to get synchronized, by everything getting reimputed to the public server's TOPMed panel with the better input prep. However, at least for the moment, I think the batch count discrepancy isn't that much of an issue (I think). There may be some step that assumes all chromosomes are present; but in general, the pipeline merely processes whatever is present. So it shouldn't be too hard to force it to use these files as-is.

shukwong commented 3 years ago

need to have a .sample file linked with each chromosome, in case the samples are slightly different between chromosome X and the autosomes (which is the case in PLCO)