Closed jungkim2 closed 1 year ago
@jungkim2 I can't find these variants in the test vcfs we've been using. However, we've just updated the code to select single clinVar calls from multiple submissions, and I checked these variants and it looks like we are selecting the most recent date for both (submission_merged_df
contains ALL submissions, and submission_final_df
retains one submission per variant):
> submission_merged_df[submission_merged_df$vcf_id == "18-62368965-A-G",]
# A tibble: 2 × 48
VariationID ClinicalSignificance.x LastEvaluated.x Description SubmittedPhenotypeInfo ReportedPhenotypeInfo ReviewStatus.x CollectionMethod OriginCounts
<dbl> <chr> <chr> <chr> <chr> <chr> <chr> <chr> <chr>
1 1361382 Likely benign Jun 07, 2023 "This alteration … MeSH:D030342 C0950123:Inborn gene… criteria prov… clinical testing germline:na
2 1361382 Uncertain significance Jul 12, 2022 "This sequence ch… MedGen:CN517202 CN517202:not provided criteria prov… clinical testing germline:na
# ℹ 39 more variables: Submitter <chr>, SCV <chr>, SubmittedGeneSymbol <chr>, ExplanationOfInterpretation <chr>, `#AlleleID` <dbl>, Type <chr>, Name <chr>,
# GeneID <dbl>, GeneSymbol <chr>, HGNC_ID <chr>, ClinicalSignificance.y <chr>, ClinSigSimple <dbl>, LastEvaluated.y <chr>, `RS# (dbSNP)` <dbl>,
# `nsv/esv (dbVar)` <chr>, RCVaccession <chr>, PhenotypeIDS <chr>, PhenotypeList <chr>, Origin <chr>, OriginSimple <chr>, Assembly <chr>,
# ChromosomeAccession <chr>, Chromosome <chr>, Start <dbl>, Stop <dbl>, ReferenceAllele <chr>, AlternateAllele <chr>, Cytogenetic <chr>, ReviewStatus.y <chr>,
# NumberSubmitters <dbl>, Guidelines <chr>, TestedInGTR <chr>, OtherIDs <chr>, SubmitterCategories <dbl>, PositionVCF <int>, ReferenceAlleleVCF <chr>,
# AlternateAlleleVCF <chr>, vcf_id <chr>, LastEvaluated <chr>
> submission_final_df[submission_final_df$vcf_id == "18-62368965-A-G",]
# A tibble: 1 × 10
VariationID ClinicalSignificance LastEvaluated Description SubmittedPhenotypeInfo ReportedPhenotypeInfo ReviewStatus SubmittedGeneSymbol GeneSymbol vcf_id
<dbl> <chr> <chr> <chr> <chr> <chr> <chr> <chr> <chr> <chr>
1 1361382 Likely benign Jun 07, 2023 This alteration… MeSH:D030342 C0950123:Inborn gene… criteria pr… TNFRSF11A TNFRSF11A 18-62…
> submission_merged_df[submission_merged_df$vcf_id == "12-109596525-A-G",]
# A tibble: 4 × 48
VariationID ClinicalSignificance.x LastEvaluated.x Description SubmittedPhenotypeInfo ReportedPhenotypeInfo ReviewStatus.x CollectionMethod OriginCounts
<dbl> <chr> <chr> <chr> <chr> <chr> <chr> <chr> <chr>
1 97569 Uncertain significance Apr 08, 2018 The MVK c.1139A>G… Not Provided CN517202:not provided criteria prov… clinical testing germline:na
2 97569 not provided NA - Hyperimmunoglobulin D… C0398691:Hyperimmuno… no assertion … literature only not provide…
3 97569 Pathogenic Aug 30, 2022 This sequence cha… MedGen:C0398691;MedGe… C0398691:Hyperimmuno… criteria prov… clinical testing germline:na
4 97569 Likely pathogenic Aug 05, 2021 - MedGen:C0398691 C0398691:Hyperimmuno… no assertion … clinical testing germline:na
# ℹ 39 more variables: Submitter <chr>, SCV <chr>, SubmittedGeneSymbol <chr>, ExplanationOfInterpretation <chr>, `#AlleleID` <dbl>, Type <chr>, Name <chr>,
# GeneID <dbl>, GeneSymbol <chr>, HGNC_ID <chr>, ClinicalSignificance.y <chr>, ClinSigSimple <dbl>, LastEvaluated.y <chr>, `RS# (dbSNP)` <dbl>,
# `nsv/esv (dbVar)` <chr>, RCVaccession <chr>, PhenotypeIDS <chr>, PhenotypeList <chr>, Origin <chr>, OriginSimple <chr>, Assembly <chr>,
# ChromosomeAccession <chr>, Chromosome <chr>, Start <dbl>, Stop <dbl>, ReferenceAllele <chr>, AlternateAllele <chr>, Cytogenetic <chr>, ReviewStatus.y <chr>,
# NumberSubmitters <dbl>, Guidelines <chr>, TestedInGTR <chr>, OtherIDs <chr>, SubmitterCategories <dbl>, PositionVCF <int>, ReferenceAlleleVCF <chr>,
# AlternateAlleleVCF <chr>, vcf_id <chr>, LastEvaluated <chr>
> submission_final_df[submission_final_df$vcf_id == "12-109596525-A-G",]
# A tibble: 1 × 10
VariationID ClinicalSignificance LastEvaluated Description SubmittedPhenotypeInfo ReportedPhenotypeInfo ReviewStatus SubmittedGeneSymbol GeneSymbol vcf_id
<dbl> <chr> <chr> <chr> <chr> <chr> <chr> <chr> <chr> <chr>
1 97569 Pathogenic Aug 30, 2022 This sequence c… MedGen:C0398691;MedGe… C0398691:Hyperimmuno… criteria pr… MVK MVK 12-10…
Yes, it seems like with recent update from the issue 85 fixed this issue!
Provide the command used or report the bug here
Still cannot figure out if there is rules to which ones doesn't get correct, but still in some cases with 1NR, not getting most recent date as final_call
18-62368965-A-G 12-109596525-A-G
What version are you using?
Add error message here (if applicable)
Add Session info
Run
sessionInfo()
and post the output below