Genomic Region of Interest filter --- Version2 of Exon-Only
Note: Previously, gene-only filter was added @ #55 . Exon-only filter is now added in this update.
What is being added:
Version 2 exon-only script for generating GRI filters for hg19 and hg38 genome. The coverage of exon-only bed file is about 5.75% of the genome (gene-only filter is ~40%). What's more, the spliceAI data contains potential splicing variants positions that have a delta score> 0.8 is used. See their documentation for why use 0.8 as a cutoff: https://github.com/Illumina/SpliceAI
A minor fix to convert final output genome position from 1-based format into 0-based format of bed file, since those genomic regions were obtained from 1-based format vcf, and gtf files. Basically, 1 is subtracted from start position. See commit 718f03d at line 236 for the fix.
New bed files can be uploaded into AWS S3 bucket after this review.
Additional information:
Data files for bed generation:
Gencode v46 gene annotation
HGMD 2022 database
Clinvar 8/6/2024 database
spliceAI v1.3 prediction data (located on ravenclaw: /mnt/ravenclaw_local/zhijiany/workdir/exome_filter)
Genomic Region of Interest filter --- Version2 of Exon-Only
Note: Previously, gene-only filter was added @ #55 . Exon-only filter is now added in this update.
What is being added:
New bed files can be uploaded into AWS S3 bucket after this review.
Additional information:
Data files for bed generation: