Open antonylebechec opened 1 month ago
To create a transcript view, some parameters are needed.
As an example, this param identify a table to generate (transcripts
), and a structure corresponding to columns dedicated to transcripts, such as :
from_column_format
) like snpEff annotation,from_columns_map
), like dbNSFP annotation{
"transcripts": {
"table": "transcripts",
"struct": {
"from_column_format": [
{
"transcripts_column": "ANN",
"transcripts_infos_column": "Feature_ID"
}
],
"from_columns_map": [
{
"transcripts_column": "Ensembl_transcriptid",
"transcripts_infos_columns": [
"genename",
"Ensembl_geneid",
"LIST_S2_score",
"LIST_S2_pred"
]
},
{
"transcripts_column": "Ensembl_transcriptid",
"transcripts_infos_columns": [
"genename",
"VARITY_R_score",
"Aloft_pred"
]
}
]
}
}
}
This param is used with function Variants.create_transcript_view()
to generate a transcripts
table:
#CHROM POS REF ALT transcript transcript_1 AAposAAlength Distance Allele Aloft_pred HGVSc ... cDNAposcDNAlength genename FeatureID LIST_S2_pred ERRORSWARNINGSINFO VARITY_R_score GeneID Annotation GeneName_1 HGVSp AnnotationImpact
0 chr1 28736 A C NR_024540.1 NR_024540.1 None None C None n.50+585T>G ... None WASH7P NR_024540.1 None None None WASH7P intron_variant WASH7P None MODIFIER
1 chr1 28736 A C NR_036051.1 NR_036051.1 None 1630.0 C None n.-1630A>C ... None MIR1302-2 NR_036051.1 None None None MIR1302-2 upstream_gene_variant MIR1302-2 None MODIFIER
2 chr1 28736 A C NR_036266.1 NR_036266.1 None 1630.0 C None n.-1630A>C ... None MIR1302-9 NR_036266.1 None None None MIR1302-9 upstream_gene_variant MIR1302-9 None MODIFIER
3 chr1 28736 A C NR_036267.1 NR_036267.1 None 1630.0 C None n.-1630A>C ... None MIR1302-10 NR_036267.1 None None None MIR1302-10 upstream_gene_variant MIR1302-10 None MODIFIER
4 chr1 28736 A C NR_036268.1 NR_036268.1 None 1630.0 C None n.-1630A>C ... None MIR1302-11 NR_036268.1 None None None MIR1302-11 upstream_gene_variant MIR1302-11 None MODIFIER
5 chr1 35144 A C NR_026818.1 NR_026818.1 None None C None n.597T>G ... None FAM138A NR_026818.1 None None None FAM138A non_coding_transcript_exon_variant FAM138A None MODIFIER
6 chr1 35144 A C NR_026820.1 NR_026820.1 None None C None n.597T>G ... None FAM138F NR_026820.1 None None None FAM138F non_coding_transcript_exon_variant FAM138F None MODIFIER
7 chr1 35144 A C NR_026822.1 NR_026822.1 None None C None n.597T>G ... None FAM138C NR_026822.1 None None None FAM138C non_coding_transcript_exon_variant FAM138C None MODIFIER
8 chr1 35144 A C NR_036051.1 NR_036051.1 None 4641.0 C None n.*4641A>C ... None MIR1302-2 NR_036051.1 None None None MIR1302-2 downstream_gene_variant MIR1302-2 None MODIFIER
9 chr1 35144 A C NR_036266.1 NR_036266.1 None 4641.0 C None n.*4641A>C ... None MIR1302-9 NR_036266.1 None None None MIR1302-9 downstream_gene_variant MIR1302-9 None MODIFIER
10 chr1 35144 A C NR_036267.1 NR_036267.1 None 4641.0 C None n.*4641A>C ... None MIR1302-10 NR_036267.1 None None None MIR1302-10 downstream_gene_variant MIR1302-10 None MODIFIER
11 chr1 35144 A C NR_036268.1 NR_036268.1 None 4641.0 C None n.*4641A>C ... None MIR1302-11 NR_036268.1 None None None MIR1302-11 downstream_gene_variant MIR1302-11 None MODIFIER
12 chr1 69101 A G ENST00000335137 ENST00000335137 None None None . None ... None OR4F5 None T None 0.27627227 None None OR4F5 None None
13 chr1 69101 A G ENST00000641515 ENST00000641515 None None None . None ... None OR4F5 None T None . None None OR4F5 None None
14 chr1 69101 A G NM_001005484.1 NM_001005484.1 4/305 None G None c.11A>G ... 11/918 OR4F5 NM_001005484.1 None None None OR4F5 missense_variant OR4F5 p.Glu4Gly MODERATE
15 chr1 768251 A G NR_047519.1 NR_047519.1 None None G None n.287+3767A>G ... None LINC01128 NR_047519.1 None None None LINC01128 intron_variant LINC01128 None MODIFIER
16 chr1 768251 A G NR_047521.1 NR_047521.1 None None G None n.287+3767A>G ... None LINC01128 NR_047521.1 None None None LINC01128 intron_variant LINC01128 None MODIFIER
17 chr1 768251 A G NR_047523.1 NR_047523.1 None None G None n.287+3767A>G ... None LINC01128 NR_047523.1 None None None LINC01128 intron_variant LINC01128 None MODIFIER
18 chr1 768251 A G NR_047524.1 NR_047524.1 None None G None n.287+3767A>G ... None LINC01128 NR_047524.1 None None None LINC01128 intron_variant LINC01128 None MODIFIER
19 chr1 768251 A G NR_047525.1 NR_047525.1 None None G None n.154+3767A>G ... None LINC01128 NR_047525.1 None None None LINC01128 intron_variant LINC01128 None MODIFIER
20 chr1 768251 A G NR_047526.1 NR_047526.1 None None G None n.287+3767A>G ... None LINC01128 NR_047526.1 None None None LINC01128 intron_variant LINC01128 None MODIFIER
21 chr1 768252 A G NR_047519.1 NR_047519.1 None None G None n.287+3768A>G ... None LINC01128 NR_047519.1 None None None LINC01128 intron_variant LINC01128 None MODIFIER
22 chr1 768252 A G NR_047521.1 NR_047521.1 None None G None n.287+3768A>G ... None LINC01128 NR_047521.1 None None None LINC01128 intron_variant LINC01128 None MODIFIER
23 chr1 768252 A G NR_047523.1 NR_047523.1 None None G None n.287+3768A>G ... None LINC01128 NR_047523.1 None None None LINC01128 intron_variant LINC01128 None MODIFIER
24 chr1 768252 A G NR_047524.1 NR_047524.1 None None G None n.287+3768A>G ... None LINC01128 NR_047524.1 None None None LINC01128 intron_variant LINC01128 None MODIFIER
25 chr1 768252 A G NR_047525.1 NR_047525.1 None None G None n.154+3768A>G ... None LINC01128 NR_047525.1 None None None LINC01128 intron_variant LINC01128 None MODIFIER
26 chr1 768252 A G NR_047526.1 NR_047526.1 None None G None n.287+3768A>G ... None LINC01128 NR_047526.1 None None None LINC01128 intron_variant LINC01128 None MODIFIER
27 chr1 768253 A G NR_047519.1 NR_047519.1 None None G None n.287+3769A>G ... None LINC01128 NR_047519.1 None None None LINC01128 intron_variant LINC01128 None MODIFIER
28 chr1 768253 A G NR_047521.1 NR_047521.1 None None G None n.287+3769A>G ... None LINC01128 NR_047521.1 None None None LINC01128 intron_variant LINC01128 None MODIFIER
29 chr1 768253 A G NR_047523.1 NR_047523.1 None None G None n.287+3769A>G ... None LINC01128 NR_047523.1 None None None LINC01128 intron_variant LINC01128 None MODIFIER
30 chr1 768253 A G NR_047524.1 NR_047524.1 None None G None n.287+3769A>G ... None LINC01128 NR_047524.1 None None None LINC01128 intron_variant LINC01128 None MODIFIER
31 chr1 768253 A G NR_047525.1 NR_047525.1 None None G None n.154+3769A>G ... None LINC01128 NR_047525.1 None None None LINC01128 intron_variant LINC01128 None MODIFIER
32 chr1 768253 A G NR_047526.1 NR_047526.1 None None G None n.287+3769A>G ... None LINC01128 NR_047526.1 None None None LINC01128 intron_variant LINC01128 None MODIFIER
33 chr7 55249063 G A NM_001346897.2 NM_001346897.2 742/1091 None A None c.2226G>A ... 2487/3848 EGFR NM_001346897.2 None None None EGFR synonymous_variant EGFR p.Gln742Gln LOW
34 chr7 55249063 G A NM_001346898.2 NM_001346898.2 787/1136 None A None c.2361G>A ... 2622/3983 EGFR NM_001346898.2 None None None EGFR synonymous_variant EGFR p.Gln787Gln LOW
35 chr7 55249063 G A NM_001346899.1 NM_001346899.1 742/1165 None A None c.2226G>A ... 2483/6218 EGFR NM_001346899.1 None None None EGFR synonymous_variant EGFR p.Gln742Gln LOW
36 chr7 55249063 G A NM_001346900.2 NM_001346900.2 734/1157 None A None c.2202G>A ... 2393/9676 EGFR NM_001346900.2 None None None EGFR synonymous_variant EGFR p.Gln734Gln LOW
37 chr7 55249063 G A NM_001346941.2 NM_001346941.2 520/943 None A None c.1560G>A ... 1821/9104 EGFR NM_001346941.2 None None None EGFR synonymous_variant EGFR p.Gln520Gln LOW
38 chr7 55249063 G A NM_005228.5 NM_005228.5 787/1210 None A None c.2361G>A ... 2622/9905 EGFR NM_005228.5 None None None EGFR synonymous_variant EGFR p.Gln787Gln LOW
39 chr7 55249063 G A NR_047551.1 NR_047551.1 None None A None n.1201C>T ... None EGFR-AS1 NR_047551.1 None None None EGFR-AS1 non_coding_transcript_exon_variant EGFR-AS1 None MODIFIER
Calculation to add transcripts annotations as a field in INFO in JSON format. Example (create config/param.transcripts.json with param from help):
howard calculation --input="tests/data/example.ann.transcripts.vcf.gz" --output="/tmp/output.transcript.vcf" --calculations="TRANSCRIPTS_JSON" --param="config/param.transcripts.json"
Prioritization of transcripts in 'HOWARD' mode with 'transcripts' profiles available in a configuration JSON file, with 'PZT' as prefix:
"transcripts": {
...
"prioritization": {
"profiles": ["transcripts"],
"prioritization_config": "config/prioritization_transcripts_profiles.json",
"pzprefix": "PZT",
"prioritization_score_mode": "HOWARD"
}
}
With prioritization parameters based on 'LIST_S2_score' (file 'config/prioritization_transcripts_profiles.json'):
{
"transcripts": {
"LIST_S2_score": [
{
"type": "gt",
"value": "0.75",
"score": 10,
"flag": "PASS",
"comment": ["Very Good LIST Score"]
},
{
"type": "gt",
"value": "0.50",
"score": 10,
"flag": "PASS",
"comment": ["Good LIST Score"]
}
]
}
}
Command:
howard calculation --input='tests/data/example.dbnsfp.transcripts.vcf.gz' --output='/tmp/example.calculation.transcripts.vcf' --param='config/param.transcripts.json' --calculations='TRANSCRIPTS_PRIORITIZATION'
Output VCF with PZTTranscript, PZTScore and PZTFlag (partial output):
#CHROM POS ID REF ALT QUAL FILTER INFO
chr1 28736 . A C 100 PASS CLNSIG=pathogenic
chr1 35144 . A C 100 PASS CLNSIG=non-pathogenic
chr1 69101 . A G 100 PASS genename=OR4F5;Ensembl_transcriptid=ENST00000641515,ENST00000335137;LIST_S2_score=0.79822,0.716128;PZTTranscript=ENST00000641515;PZTScore=20;PZTFlag=PASS
Include transcripts annotations, either in JSON format or structured format (like 'snpEff'), with calculation tool.
Parameters in json file (e.g. 'config/param.transcripts.json'):
{
"transcripts": {
"transcripts_info_field_json": "transcripts_json",
"transcripts_info_field_format": "transcripts_ann",
"table": "transcripts",
"struct": {...}
...
}
Command:
howard calculation --input='tests/data/example.ann.transcripts.vcf.gz' --output='/tmp/example.calculation.transcripts.vcf' --param='config/param.transcripts.json' --calculations='TRANSCRIPTS_ANNOTATIONS'
Output VCF with 'transcripts_json' and 'transcripts_ann' INFO fields (partial output):
##INFO=<ID=ANN,Number=.,Type=String,Description="Functional annotations: 'Allele | Annotation | Annotation_Impact | Gene_Name | Gene_ID | Feature_Type | Feature_ID | Transcript_BioType | Rank | HGVS.c | HGVS.p | cDNA.pos / cDNA.length | CDS.pos / CDS.length | AA.pos / AA.length | Distance | ERRORS / WARNINGS / INFO'">
##INFO=<ID=transcripts_json,Number=.,Type=String,Description="Transcripts in JSON format">
##INFO=<ID=transcripts_ann,Number=.,Type=String,Description="Transcripts annotations: 'transcript | VARITY_R_score | transcript_1 | Annotation | FeatureID | Allele | HGVSc | Aloft_pred | HGVSp | TranscriptBioType | Distance | genename | LIST_S2_score | AAposAAlength | GeneID | Ensembl_geneid | Rank | GeneName_1 | ERRORSWARNINGSINFO | FeatureType | LIST_S2_pred | CDSposCDSlength | cDNAposcDNAlength | AnnotationImpact'">
#CHROM POS ID REF ALT QUAL FILTER INFO
chr1 69101 . A G 100 PASS ANN=G|missense_variant|...;genename=OR4F5;Ensembl_transcriptid=ENST00000641515,ENST00000335137;LIST_S2_score=0.79822,0.716128;transcripts_json={"ENST00000335137":{"VARITY_R_score":"0.27627227","transcript_1":"ENST00000335137","Annotation":null,"FeatureID":null,"Allele":null,"HGVSc":null,"Aloft_pred":".","HGVSp":null,"TranscriptBioType":null,"Distance":null,"genename":"OR4F5","LIST_S2_score":"0.716128","AAposAAlength":null,"GeneID":null,"Ensembl_geneid":"ENSG00000186092","Rank":null,"GeneName_1":"OR4F5","ERRORSWARNINGSINFO":null,"FeatureType":null,"LIST_S2_pred":"T","CDSposCDSlength":null,"cDNAposcDNAlength":null,"AnnotationImpact":null},"ENST00000641515":{"VARITY_R_score":".","transcript_1":"ENST00000641515","Annotation":null,"FeatureID":null,"Allele":null,"HGVSc":null,"Aloft_pred":".","HGVSp":null,"TranscriptBioType":null,"Distance":null,"genename":"OR4F5","LIST_S2_score":"0.79822","AAposAAlength":null,"GeneID":null,"Ensembl_geneid":"ENSG00000186092","Rank":null,"GeneName_1":"OR4F5","ERRORSWARNINGSINFO":null,"FeatureType":null,"LIST_S2_pred":"T","CDSposCDSlength":null,"cDNAposcDNAlength":null,"AnnotationImpact":null},"NM_001005484.1":{"VARITY_R_score":null,"transcript_1":"NM_001005484.1","Annotation":"missense_variant","FeatureID":"NM_001005484.1","Allele":"G","HGVSc":"c.11A>G","Aloft_pred":null,"HGVSp":"p.Glu4Gly","TranscriptBioType":"protein_coding","Distance":null,"genename":"OR4F5","LIST_S2_score":null,"AAposAAlength":"4/305","GeneID":"OR4F5","Ensembl_geneid":null,"Rank":"1/1","GeneName_1":"OR4F5","ERRORSWARNINGSINFO":null,"FeatureType":"transcript","LIST_S2_pred":null,"CDSposCDSlength":"11/918","cDNAposcDNAlength":"11/918","AnnotationImpact":"MODERATE"}};transcripts_ann=ENST00000335137|0.27627227|ENST00000335137|||||.||||OR4F5|0.716128|||ENSG00000186092||OR4F5|||T|||,ENST00000641515|.|ENST00000641515|||||.||||OR4F5|0.79822|||ENSG00000186092||OR4F5|||T|||,NM_001005484.1||NM_001005484.1|missense_variant|NM_001005484.1|G|c.11A>G||p.Glu4Gly|protein_coding||OR4F5||4/305|OR4F5||1/1|OR4F5||transcript||11/918|11/918|MODERATE
In order to consider also variants' annotations into transcripts prioritization, INFO column of VCF is included into the transcripts view/bubble. Thus, it is now allowed to parameterize prioritization profiles for transcripts with annotations from variants.
Here is a example of a parametrization with an annotation from transcripts 'LIST_S2_score' and an annotation from variants 'CLNSIG':
{
"transcripts": {
"LIST_S2_score": [
{
"type": "gt",
"value": "0.75",
"score": 10,
"flag": "PASS",
"comment": ["Very Good LIST Score"]
},
{
"type": "gt",
"value": "0.50",
"score": 10,
"flag": "PASS",
"comment": ["Good LIST Score"]
}
],
"CLNSIG": [
{
"type": "eq",
"value": "pathogenic",
"score": 100,
"flag": "PASS",
"comment": ["Pathogenic"]
}
]
}
}
TODO:
In order to explore transcripts information related to each variant, especially to calculate scores, need to create a "transcript view". It can be another table or a view (e.g. "transcripts"), which each line correspond to a transcript (i.e. multiple lines for a variant). A transcript ID column as a uniq key is needed.