etal / cnvkit

Copy number variant detection from targeted DNA sequencing
http://cnvkit.readthedocs.org
Other
501 stars 162 forks source link

Update included references for hg38 #822

Open marchoeppner opened 1 year ago

marchoeppner commented 1 year ago

Hi,

this is probably a minor thing, but the documentation as well as the included references (accessibility etc) are very much focused around hg19. However, at this point we should probably consider hg19 defunct (or at least "bad practice"), given that hg38 came out 10 years ago. It would hence be nice if CNVkit could be updated throughout (incl. documentation) to use and refer to hg38/GRCh38.

/M

serge2016 commented 1 year ago
# GRCh38.d1.vd1)
# https://gdc.cancer.gov/about-data/data-harmonization-and-generation/gdc-reference-files
refGenomeURL="https://api.gdc.cancer.gov/data/254f697d-310d-4d7d-a27b-27fbf767a834" # GRCh38.d1.vd1.fa.tar.gz
refGenomeFN0="GRCh38.d1.vd1.fa.tar.gz"
refGenomeFN="${refGenomeFN0%.tar.gz}"
refGenomeV="${refGenomeFN%.*}"

refGenomeFile="$REFDIR/$refGenomeV/$refGenomeFN"
refGenomeDir="$(dirname "$refGenomeFile")"
refGenomeFaiFile="${refGenomeFile}.fai"
refGenomeDictFile="${refGenomeFile%.*}.dict"

runDir="$(pwd)"
cnvkitRefDir="$runDir/$refGenomeV"
accessFile="$cnvkitRefDir/$GENOMEBUILD.bed"
mkdir -p "$runDir" "$cnvkitRefDir"

if [[ ! -s "$accessFile" ]]; then
    $CNVKIT access $refGenomeFile -o $accessFile
fi