LiuzLab / AI_MARRVEL

AI-MARRVEL (AIM) is an AI system for rare genetic disorder diagnosis
GNU General Public License v3.0
8 stars 6 forks source link

Add exon-only gri filter script #81

Open SpicyChicken6 opened 2 months ago

SpicyChicken6 commented 2 months ago

Genomic Region of Interest filter --- Version2 of Exon-Only

Note: Previously, gene-only filter was added @ #55 . Exon-only filter is now added in this update.


What is being added:

  1. Version 2 exon-only script for generating GRI filters for hg19 and hg38 genome. The coverage of exon-only bed file is about 5.75% of the genome (gene-only filter is ~40%). What's more, the spliceAI data contains potential splicing variants positions that have a delta score> 0.8 is used. See their documentation for why use 0.8 as a cutoff: https://github.com/Illumina/SpliceAI
  2. A minor fix to convert final output genome position from 1-based format into 0-based format of bed file, since those genomic regions were obtained from 1-based format vcf, and gtf files. Basically, 1 is subtracted from start position. See commit 718f03d at line 236 for the fix.
  3. New bed files can be uploaded into AWS S3 bucket after this review.

    Additional information:

    Data files for bed generation:

    1. Gencode v46 gene annotation
    2. HGMD 2022 database
    3. Clinvar 8/6/2024 database
    4. spliceAI v1.3 prediction data (located on ravenclaw: /mnt/ravenclaw_local/zhijiany/workdir/exome_filter)
jylee-bcm commented 5 days ago

Sorry for late review. What files does this code generate, and which files of s3 bucket aim data dependencie will be updated?