diskin-lab-chop / AutoGVP

19 stars 3 forks source link

Bug: ClinVar 1NR not getting most recent #102

Closed jungkim2 closed 1 year ago

jungkim2 commented 1 year ago

Provide the command used or report the bug here

Still cannot figure out if there is rules to which ones doesn't get correct, but still in some cases with 1NR, not getting most recent date as final_call

18-62368965-A-G 12-109596525-A-G

What version are you using?

Add error message here (if applicable)

Add Session info

Run sessionInfo() and post the output below

rjcorb commented 1 year ago

@jungkim2 I can't find these variants in the test vcfs we've been using. However, we've just updated the code to select single clinVar calls from multiple submissions, and I checked these variants and it looks like we are selecting the most recent date for both (submission_merged_df contains ALL submissions, and submission_final_df retains one submission per variant):

> submission_merged_df[submission_merged_df$vcf_id == "18-62368965-A-G",]
# A tibble: 2 × 48
  VariationID ClinicalSignificance.x LastEvaluated.x Description        SubmittedPhenotypeInfo ReportedPhenotypeInfo ReviewStatus.x CollectionMethod OriginCounts
        <dbl> <chr>                  <chr>           <chr>              <chr>                  <chr>                 <chr>          <chr>            <chr>       
1     1361382 Likely benign          Jun 07, 2023    "This alteration … MeSH:D030342           C0950123:Inborn gene… criteria prov… clinical testing germline:na 
2     1361382 Uncertain significance Jul 12, 2022    "This sequence ch… MedGen:CN517202        CN517202:not provided criteria prov… clinical testing germline:na 
# ℹ 39 more variables: Submitter <chr>, SCV <chr>, SubmittedGeneSymbol <chr>, ExplanationOfInterpretation <chr>, `#AlleleID` <dbl>, Type <chr>, Name <chr>,
#   GeneID <dbl>, GeneSymbol <chr>, HGNC_ID <chr>, ClinicalSignificance.y <chr>, ClinSigSimple <dbl>, LastEvaluated.y <chr>, `RS# (dbSNP)` <dbl>,
#   `nsv/esv (dbVar)` <chr>, RCVaccession <chr>, PhenotypeIDS <chr>, PhenotypeList <chr>, Origin <chr>, OriginSimple <chr>, Assembly <chr>,
#   ChromosomeAccession <chr>, Chromosome <chr>, Start <dbl>, Stop <dbl>, ReferenceAllele <chr>, AlternateAllele <chr>, Cytogenetic <chr>, ReviewStatus.y <chr>,
#   NumberSubmitters <dbl>, Guidelines <chr>, TestedInGTR <chr>, OtherIDs <chr>, SubmitterCategories <dbl>, PositionVCF <int>, ReferenceAlleleVCF <chr>,
#   AlternateAlleleVCF <chr>, vcf_id <chr>, LastEvaluated <chr>
> submission_final_df[submission_final_df$vcf_id == "18-62368965-A-G",]
# A tibble: 1 × 10
  VariationID ClinicalSignificance LastEvaluated Description      SubmittedPhenotypeInfo ReportedPhenotypeInfo ReviewStatus SubmittedGeneSymbol GeneSymbol vcf_id
        <dbl> <chr>                <chr>         <chr>            <chr>                  <chr>                 <chr>        <chr>               <chr>      <chr> 
1     1361382 Likely benign        Jun 07, 2023  This alteration… MeSH:D030342           C0950123:Inborn gene… criteria pr… TNFRSF11A           TNFRSF11A  18-62…
> submission_merged_df[submission_merged_df$vcf_id == "12-109596525-A-G",]
# A tibble: 4 × 48
  VariationID ClinicalSignificance.x LastEvaluated.x Description        SubmittedPhenotypeInfo ReportedPhenotypeInfo ReviewStatus.x CollectionMethod OriginCounts
        <dbl> <chr>                  <chr>           <chr>              <chr>                  <chr>                 <chr>          <chr>            <chr>       
1       97569 Uncertain significance Apr 08, 2018    The MVK c.1139A>G… Not Provided           CN517202:not provided criteria prov… clinical testing germline:na 
2       97569 not provided           NA              -                  Hyperimmunoglobulin D… C0398691:Hyperimmuno… no assertion … literature only  not provide…
3       97569 Pathogenic             Aug 30, 2022    This sequence cha… MedGen:C0398691;MedGe… C0398691:Hyperimmuno… criteria prov… clinical testing germline:na 
4       97569 Likely pathogenic      Aug 05, 2021    -                  MedGen:C0398691        C0398691:Hyperimmuno… no assertion … clinical testing germline:na 
# ℹ 39 more variables: Submitter <chr>, SCV <chr>, SubmittedGeneSymbol <chr>, ExplanationOfInterpretation <chr>, `#AlleleID` <dbl>, Type <chr>, Name <chr>,
#   GeneID <dbl>, GeneSymbol <chr>, HGNC_ID <chr>, ClinicalSignificance.y <chr>, ClinSigSimple <dbl>, LastEvaluated.y <chr>, `RS# (dbSNP)` <dbl>,
#   `nsv/esv (dbVar)` <chr>, RCVaccession <chr>, PhenotypeIDS <chr>, PhenotypeList <chr>, Origin <chr>, OriginSimple <chr>, Assembly <chr>,
#   ChromosomeAccession <chr>, Chromosome <chr>, Start <dbl>, Stop <dbl>, ReferenceAllele <chr>, AlternateAllele <chr>, Cytogenetic <chr>, ReviewStatus.y <chr>,
#   NumberSubmitters <dbl>, Guidelines <chr>, TestedInGTR <chr>, OtherIDs <chr>, SubmitterCategories <dbl>, PositionVCF <int>, ReferenceAlleleVCF <chr>,
#   AlternateAlleleVCF <chr>, vcf_id <chr>, LastEvaluated <chr>
> submission_final_df[submission_final_df$vcf_id == "12-109596525-A-G",]
# A tibble: 1 × 10
  VariationID ClinicalSignificance LastEvaluated Description      SubmittedPhenotypeInfo ReportedPhenotypeInfo ReviewStatus SubmittedGeneSymbol GeneSymbol vcf_id
        <dbl> <chr>                <chr>         <chr>            <chr>                  <chr>                 <chr>        <chr>               <chr>      <chr> 
1       97569 Pathogenic           Aug 30, 2022  This sequence c… MedGen:C0398691;MedGe… C0398691:Hyperimmuno… criteria pr… MVK                 MVK        12-10…
jungkim2 commented 1 year ago

Yes, it seems like with recent update from the issue 85 fixed this issue!