BoevaLab / FREEC

Control-FREEC: Copy number and genotype annotation in whole genome and whole exome sequencing data
153 stars 49 forks source link

hg38 Annotation files #57

Closed maryawood closed 5 years ago

maryawood commented 5 years ago

Hello,

I'm hoping to run FREEC for some WES samples with the hg38 genome build, but I'm a bit confused about which annotation files to use. Where can I find the appropriate chromosome files (chrFiles in the config file), chromosome length file (chrLenFile), SNP file (SNPfile), and capture file (captureRegions) for hg38? I see the link to the mappability file for this build on the website, but I only see SNP files for hg18/hg19, and I don't see any information about where to find the chromosome-related files for any build.

Thanks,

Mary

valeu commented 5 years ago

Dear Mary, what are you looking for specifically? chrFiles - should be a folder with chr1.fa, chr2.fa, etc. Usually many bioinformaticians have already such a folder, so I did not provide a link. If you need to download the fasta files, you can do it, for example, from http://hgdownload.cse.ucsc.edu/goldenPath/hg38/chromosomes/

For chromosome length file, you can use, for example, the first 24 lines from http://hgdownload.cse.ucsc.edu/goldenPath/hg38/bigZips/hg38.chrom.sizes I would only suggest to reorder it to have chr1-22 first, then chrX and chrY.

"captureRegions" correspond to the WES protocol you have used. Usually the corresponding .bed file can be found on the Illumina website (if you used an Illumina capture method).

Regarding the dbSNP file, you can download the .vcf.gz file from ftp://ftp.ncbi.nlm.nih.gov/snp/organisms/human_9606_b151_GRCh38p7/VCF/ I suggest to use only common SNPs: ftp://ftp.ncbi.nlm.nih.gov/snp/organisms/human_9606_b151_GRCh38p7/VCF/00-common_all.vcf.gz

maryawood commented 5 years ago

Thank you for clarifying, this is very helpful!