cBioPortal / icebox

very low priority issues
0 stars 0 forks source link

genie structural variant profile numbers seem off #465

Open tmazor opened 1 year ago

tmazor commented 1 year ago

I think there's something off with the structural variant profile in GENIE v13-public.

The cohort as a whole shows 15.4% of samples with structural variant data, which seems far too low. image

I filtered to just MSK & DFCI, which as far as I know should be almost 100% of samples with structural variant, but it's only 22%: image

ritikakundra commented 1 year ago

@tmazor we discussed this with Sage yesterday. Their case lists are not right. We are fixing on our end for immediate fix and so is Sage for future imports.

ritikakundra commented 1 year ago

@tmazor Just saw the import run: Here MSK/DFCI samples

Screen Shot 2023-03-10 at 8 12 44 PM

And total SV and

Screen Shot 2023-03-10 at 8 12 27 PM
tmazor commented 1 year ago

Awesome! Those numbers seem much more reasonable. Although I'm a little surprised that there are more samples with structural variants than with copy number. Is that really true @ritikakundra ?

tmazor commented 1 year ago

@ritikakundra I'm looking at the BPC CRC public dataset and it seems to have the same issue:

image
ritikakundra commented 1 year ago

@tmazor we just fixed public genie to check the issue, but we have a global fix scheduled. All data files will get updated this week.

ritikakundra commented 1 year ago

Awesome! Those numbers seem much more reasonable. Although I'm a little surprised that there are more samples with structural variants than with copy number. Is that really true @ritikakundra ?

Ya was myself surprised. Going to check this a little more.

tmazor commented 1 year ago

@tmazor we just fixed public genie to check the issue, but we have a global fix scheduled. All data files will get updated this week.

gotcha - sounds good!