broadinstitute / gatk

Official code repository for GATK versions 4 and up
https://software.broadinstitute.org/gatk
Other
1.72k stars 594 forks source link

Potential Mismatch between the MAF output mode and VCF output mode on Gencode V43 #9013

Open jamesemery opened 1 month ago

jamesemery commented 1 month ago

While working on #9012 I tried to update the gencode v28 datasource snippets in the Funcotator integration tests to V43. In doing so I found that it broke the MAF vs. VCF output render tests with errors of the following nature:

java.lang.AssertionError: Failed Matching VCF and MAF fields:
    VCF (Gencode_43_variantClassification):     RNA[0]  RNA[1]  RNA[2]  RNA[3]  RNA[4]  RNA[5]  RNA[6]  RNA[7]  RNA[8]  RNA[9]  RNA[10]
    MAF (Variant_Classification):               LINCRNA[0]  LINCRNA[1]  LINCRNA[2]  LINCRNA[3]  LINCRNA[4]  LINCRNA[5]  LINCRNA[6]  LINCRNA[7]  LINCRNA[8]  LINCRNA[9]  LINCRNA[10]
----
    VCF (Gencode_43_otherTranscripts):      [0] [1] [2] [3] [4] [5] [6] [7] [8] [9] [10]    PIK3CA_ENST00000643187.1_FIVE_PRIME_FLANK/PIK3CA-DT_ENST00000435560.1_RNA[11]   PIK3CA_ENST00000643187.1_FIVE_PRIME_FLANK/PIK3CA-DT_ENST00000435560.1_RNA[12]   PIK3CA_ENST00000643187.1_FIVE_PRIME_FLANK/PIK3CA-DT_ENST00000435560.1_RNA[13]   PIK3CA_ENST00000643187.1_INTRON/PIK3CA-DT_ENST00000435560.1_FIVE_PRIME_FLANK[14]    [48]    [49]    [50]    [51]    [52]    [53]    [54]    [55]    [56]    [57]    [58]    [59]    [60]    [61]    [62]    [63]    [64]    [65]    [66]    [67]    [68]    [69]    [70]    [71]    [72]    [73]    [74]    [75]    [76]    [77]    [78]    [79]    [80]    [81]    [82]    [83]    [84]    [85]    [86]    [87]    [88]    [89]    [90]    [91]    [92]    [93]    [94]    [95]    [96]    [97]    [98]    [99]    [100]   [101]   [102]   [103]
    MAF (Other_Transcripts):                [0] [1] [2] [3] [4] [5] [6] [7] [8] [9] [10]    PIK3CA_ENST00000643187.1_FIVE_PRIME_FLANK|PIK3CA-DT_ENST00000435560.1_LINCRNA[11]   PIK3CA_ENST00000643187.1_FIVE_PRIME_FLANK|PIK3CA-DT_ENST00000435560.1_LINCRNA[12]   PIK3CA_ENST00000643187.1_FIVE_PRIME_FLANK|PIK3CA-DT_ENST00000435560.1_LINCRNA[13]   PIK3CA_ENST00000643187.1_INTRON|PIK3CA-DT_ENST00000435560.1_FIVE_PRIME_FLANK[14]    [48]    [49]    [50]    [51]    [52]    [53]    [54]    [55]    [56]    [57]    [58]    [59]    [60]    [61]    [62]    [63]    [64]    [65]    [66]    [67]    [68]    [69]    [70]    [71]    [72]    [73]    [74]    [75]    [76]    [77]    [78]    [79]    [80]    [81]    [82]    [83]    [84]    [85]    [86]    [87]    [88]    [89]    [90]    [91]    [92]    [93]    [94]    [95]    [96]    [97]    [98]    [99]    [100]   [101]   [102]   [103]
----

Its unclear what is the most correct output rendering between the LINCRNA vs RNA for this specific transcript, its worth investigating and adding more robust gencodev43 tests to funcotator in case this is a real issue and not just a mismatch in the testing framework.