blachlylab / mucor3

Parses VCF data into tabular spreadsheets and aggregates data by sample
MIT License
0 stars 0 forks source link

Need different vcf atomizer output modes #7

Closed charlesgregory closed 2 years ago

charlesgregory commented 2 years ago

Need different modes for outputting data in different orientations from the vcf atomizer:

Variant centric:

One row per unique variant. Multi-sample vcfs would have multiple sample objects under the FORMAT object.

{ 
    "CHROM" : "chr1", 
    "POS" : "1", 
    "REF" : "G", 
    "ALT" : "A", 
    "INFO" : { 
        "ANN" : [ 
            {
                "effect": "one"
            }, 
            {
                "effect": "two"
            }, 
        ]
    }, 
    "FORMAT" : { 
        "SAM1": { 
            "AF": 0.2
        },
        "SAM2": { 
            "AF": 0.1
        } 
    }
}

Sample variant centric (current):

One row per each variant for each sample. We expand the FORMAT object into multiple rows (while duplicating all other information).

{ 
    # all other values the same
    "FORMAT" : { 
        "AF": 0.2 # AF for SAM1
    },
    "sample":"SAM1" 
}
{ 
    # all other values the same
    "FORMAT" : { 
        "AF": 0.1 # AF for SAM2
    },
    "sample":"SAM2"
}

Annotation centric (previously):

One row per each annotation per each variant for each sample. We expand further expand the ANN object into multiple rows (while duplicating all other information).

{ 
    # all other values the same
    "INFO" : { 
        "ANN" : {
            "effect": "one" # first ANN annotation for SAM1
        } 
    }, 
    "FORMAT" : { 
        "AF": 0.2 
    },
    "sample":"SAM1"
}
{ 
    # all other values the same
    "INFO" : { 
        "ANN" : {
            "effect": "two" # second ANN annotation for SAM1
        }
    }, 
    "FORMAT" : { 
        "AF": 0.2
    },
    "sample":"SAM1"
}
{ 
    # all other values the same
    "INFO" : { 
        "ANN" : {
            "effect": "one" # first ANN annotation for SAM2
        } 
    }, 
    "FORMAT" : { 
        "AF": 0.1
    },
    "sample":"SAM2"
}
{ 
    # all other values the same
    "INFO" : { 
        "ANN" : {
            "effect": "two" # second ANN annotation for SAM2
        } 
    }, 
    "FORMAT" : { 
        "AF": 0.1
    },
    "sample":"SAM2"
}
charlesgregory commented 2 years ago

closed as of atomizer_rework branch merge