broadinstitute / depmap_omics

What you need to process the Quarterly DepMap-Omics releases from Terra
https://depmap.org/portal/
110 stars 22 forks source link

Reference files for PureCN pipeline #164

Closed miachom closed 1 year ago

miachom commented 1 year ago

Hi,

I am interested in having the reference and baits files used for exome data of PureCN pipeline here https://github.com/broadinstitute/depmap_omics/blob/8e1a8b553b65b2f40ed3a8396f1a4c4275932e07/WGS_pipeline/PureCN_pipeline/README.md?plain=1#L11

How can I get access to this data? Or is this publicly available somewhere? Thank you

5im1z commented 1 year ago

Hi,

Interval files are public and can be found at gs://ccleparams/references/PureCN_intervals. The reference genome we use is hg38.

Best, Simone

miachom commented 1 year ago

@5im1z Hi Simone, sorry, but I still cannot find this link. Could you please post a functional link for these reference files?

Best, Mingkee

5im1z commented 1 year ago

Hi Mingkee,

If you are trying to pull it in your browser, this is the link: https://console.cloud.google.com/storage/browser/ccleparams/references/PureCN_intervals;tab=objects?prefix=&forceOnObjectsSortingFiltering=false. You might have to log in using your google credentials. Let me know if it doesn't work!

Simone

miachom commented 1 year ago

Hi Simone,

It works; thanks for the link!

Best, Mingkee

miachom commented 1 year ago

Hi Simone,

Is this reference also available publicly /Data/VCFs/Liftover/hg38.fa? I would like to get this .fa file as well and I can't seem to find it in google buckets. I tried to use our hg38 reference along with the ccle params files such as agilent_hg38_lifted_chrXY.no_header.bed and agilent_hg38_intervals.txt. But it's throwing me errors for not being able to parse.

Thanks. Mingkee

5im1z commented 1 year ago

Hi Mingkee,

We use gs://genomics-public-data/resources/broad/hg38/v0/Homo_sapiens_assembly38.fasta as our Hg38 reference fasta. If you need to pull it from the browser, you can access it through this folder which is hosted by GATK.

Thanks, Simone

miachom commented 1 year ago

Hi Simone,

I read in one of the announcements for mutation pipeline updates here https://forum.depmap.org/t/announcing-the-22q4-release/2125 that to run Mutect2, we don't need to use bait sets anymore. In such a case is running Mutect2 on CCLE cell lines for exome data without bait set alright? And if not, where can I find these files agilent_hg38_lifted_chrXY.no_header.bed and agilent_hg38_lifted_chrXX.no_header.bed? At the moment, I can see only for agilent hg19 and ice hg19.

Thanks for all of your responses!

Best, Mingkee

5im1z commented 1 year ago

Hi Mingkee,

It is correct that we are no longer using interval sets for exome mutation calls. If you need the interval files, they are stored in our public bucket gs://ccleparams where we share all of our reference files. Browser-friendly link here: https://console.cloud.google.com/storage/browser/ccleparams/references/intervals. And in case you are not aware, if you are interested in getting mutect2 calls for CCLE lines, they (among other things) can be found in our public workspace.

Simone