clingen-data-model / clinvar-ingest

Apache License 2.0
2 stars 0 forks source link

Fix false-positive Classification detection in IncludedRecords #229

Closed theferrit32 closed 1 month ago

theferrit32 commented 2 months ago

In the ClinVar new XML, ClassifiedRecord.Classifications omits classification types as children when there were no submissions of that type. IncludedRecord.Classifications includes all three classification types as children and says there were no submissions for them. Since a ClassifiedRecord which does not have (for exaple) a SomaticClinicalImpact submission will not generate a VariationArchiveClassification with type SomaticClinicalImpact, an IncludedRecord with no submissions should not generate any VariationArchiveClassifications at all.

The code as-is generates 3 VariationArchiveClassifications for every IncludedRecord, and sets num_submissions=0, review_status="no ...", etc on each of them.

This XML should yield no VariationArchiveClassification objects:

    <Classifications>
      <GermlineClassification NumberOfSubmissions="0" NumberOfSubmitters="0">
        <ReviewStatus>no classification for the single variant</ReviewStatus>
        <Description>no classification for the single variant</Description>
      </GermlineClassification>
      <SomaticClinicalImpact NumberOfSubmissions="0" NumberOfSubmitters="0">
        <ReviewStatus>no classification for the single variant</ReviewStatus>
        <Description>no classification for the single variant</Description>
      </SomaticClinicalImpact>
      <OncogenicityClassification NumberOfSubmissions="0" NumberOfSubmitters="0">
        <ReviewStatus>no classification for the single variant</ReviewStatus>
        <Description>no classification for the single variant</Description>
      </OncogenicityClassification>
    </Classifications>