bsmn / bsmn-pipeline

BSMN common data processing pipeline
11 stars 10 forks source link

hg38 versions of gnomad and PON #31

Open gevro opened 11 months ago

gevro commented 11 months ago

Hi, Your download tool only specifies one version of gnomad and PON reference files. The gnomad you supply must be hg19 because it is labeled as version r2.1.1, which is hg19. The PON isn't labeled either way.

Do you have gnomad and PON files for your pipeline for hg38?

Thank you

gevro commented 11 months ago

Note, I also don't understand this part of the code in F.PON_mask.sh:

$LIFTOVER <(awk '{if(!($1~/^chr/)) $1="chr"$1; print $1"\t"$2-1"\t"$2"\t"$3"\t"$4}' <(sort -k1,1V -k2,2g $IN)) \ $HG19_TO_HG38 $HG38_BED $UNMAPPED

It seems to perform liftover from hg19 to hg38 for the filtered variants. But if the pipeline is run with hg38 as the reference genome, this liftover should not be performed. But I don't see any condition that specifies that this liftover should only be performed for hg19 analysis.

Also, is the PON file available in the repository from hg38 or hg19?

Thanks.

bintriz commented 11 months ago

The PON file was generated from the 1000 genomes' high-coverage data set which is hg38. The official genome version of the BSMN consortium was hg19. So, the pipeline itself supports only hg19/b37. That's why we implemented the PON filter using leftover. If you need the hg38 version of the PON file or filter, you can comment out the leftover part of the job script.