LiuzLab / AI_MARRVEL

AI-MARRVEL (AIM) is an AI system for rare genetic disorder diagnosis
GNU General Public License v3.0
8 stars 5 forks source link

Create exome filter BED files and generation script #52

Closed hyunhwan-bcm closed 1 month ago

hyunhwan-bcm commented 1 month ago

Description

We need to create two versions of exome filter BED files (exon-only and gene-only), add them to our data dependencies, and include a script to generate these files in the util directory. This will provide more flexibility in filtering options for our pipeline users.

Proposed Changes

  1. Create two BED files:
    • Exon-only filter
    • Gene-only filter (including both intron and exon)
  2. Add these files to our data dependencies
  3. Create a script to generate these files and add it to the util directory
  4. Update documentation to reflect these new filtering options

Implementation Details

1. Create BED files

We need to create two BED files for each reference genome (hg19 and hg38):

These files should follow the standard BED format:

chromosome  start  end  [name]  [score]  [strand]

2. Add files to data dependencies

Add the following files to the data dependencies:

data_dependencies/
├── ref_exonic_filter_bed/ # the directory might not be the right one, please check
│   ├── hg19/
│   │   ├── exon_only.bed
│   │   └── gene_only.bed
│   └── hg38/
│       ├── exon_only.bed
│       └── gene_only.bed

Update the AWS S3 bucket with these new files:

aws s3 cp exon_only.bed s3://aim-data-dependencies-public/ref_exonic_filter_bed/hg19/
aws s3 cp gene_only.bed s3://aim-data-dependencies-public/ref_exonic_filter_bed/hg19/
aws s3 cp exon_only.bed s3://aim-data-dependencies-public/ref_exonic_filter_bed/hg38/
aws s3 cp gene_only.bed s3://aim-data-dependencies-public/ref_exonic_filter_bed/hg38/

3. Create generation script

Create a script named generate_exome_filters.py (or generate_exome_filters.R) in the util directory with the following structure:

4. Update documentation

Update the README and relevant documentation to include information about the new filtering options and how to use them in the pipeline.

Tasks

Additional Notes

jylee-bcm commented 1 month ago

Resolved with the PR #55