bioinfo-chru-strasbourg / howard

Highly Open Workflow for Annotation & Ranking toward genomic variant Discovery
GNU Affero General Public License v3.0
6 stars 2 forks source link

Add prioritization in SQL format #265

Closed antonylebechec closed 2 months ago

antonylebechec commented 2 months ago

To improve prioritization, prioritization profiles could include SQL string, instead of column filter definition. This will allow calculations of scores, flag... using multiple annotations and conditions. Also, adding PZClass in order to classify variant depending on prioritization criteria.

antonylebechec commented 2 months ago

New prioritization profiles format (old format compatible), including SQL syntax, INFO/PZClass and extra sections (e.g. "_description"):


{
    "default": {
        "_description": "Default prioritization profile",
        "_version": "1.0.0",
        "DP": [
            {
                "type": "gte",
                "value": "50",
                "fields": ["DP"],
                "score": 5,
                "flag": "PASS",
                "comment": [
                    "DP higher than 50"
                ]
            },
            {
                "type": "lt",
                "value": "50",
                "fields": ["DP"],
                "score": 0,
                "flag": "FILTERED",
                "comment": [
                    "DP lower than 50"
                ]
            }
        ],
        "CLNSIG": [
            {
                "type": "equals",
                "value": "pathogenic",
                "fields": ["CLNSIG"],
                "score": 15,
                "flag": "PASS",
                "comment": [
                    "Described on CLINVAR database as pathogenic"
                ]
            },
            {
                "type": "equals",
                "value": "non-pathogenic",
                "fields": ["CLNSIG"],
                "score": -100,
                "flag": "FILTERED",
                "comment": [
                    "Described on CLINVAR database as non-pathogenic"
                ]
            }
        ],
        "Class": [
            {
                "sql": " DP >= 100 OR regexp_matches(CLNSIG, 'Pathogenic') ",
                "fields": ["DP", "CLNSIG"],
                "score": 100,
                "flag": "PASS",
                "class": "PM1,PM2",
                "comment": [
                    "Described on CLINVAR database as pathogenic, classified as PM1 and PM2"
                ]
            },
            {
                "sql": ["DP >= 200", "OR regexp_matches(CLNSIG, 'Pathogenic')"],
                "fields": ["DP", "CLNSIG"],
                "score": 200,
                "flag": "PASS",
                "class": ["PM1", "PM2"],
                "comment": [
                    "Described on CLINVAR database as non-pathogenic, classified as PM1 and PM3"
                ]
            }
        ]
    }
}
antonylebechec commented 2 months ago

done.