Clinical-Genomics / scout

VCF visualization interface
https://clinical-genomics.github.io/scout
BSD 3-Clause "New" or "Revised" License
148 stars 45 forks source link

Can we add more information for CNVs uploaded to Scout? #4444

Open mathiasbio opened 6 months ago

mathiasbio commented 6 months ago

As an example, (I hope that this Scout-link will persist): https://scout-stage.scilifelab.se/cust087/F0049453/sv/variants/8e2113f40646b16473f9a12af4e57e0b

This is one of the variants that we're looking at during our validation of some myeloid-cases belonging to cust087.

As far as I can tell the only quality type / read-support type of information we have for these CNVs is this:

image

However the VCF that we upload contains more information that could be added. But I assume we haven't coded Scout to account for all types of variables, and maybe for good reason.

For instance, in the sample column for this case (tumor only) we have: GT:GQ:CN:CNQ 0/1:0:6:18

And in the INFO-field information like this: FOLD_CHANGE=5.73371;FOLD_CHANGE_LOG=2.51947;PROBES=18

Which I think could be useful for the geneticist doing the interpretation. But this is just what I found now looking at it for a minute, but I wonder if we could take a look together and see what information is present in the VCF and what might be useful to add into the Scout view.

dnil commented 6 months ago

Absolutely, happy to! This is exactly the way we try to work: when the pipeline has new info available, they make an issue just like this. PRs are also welcome, but issues are the most important.

I don't quite recognize the keys from CNVnator, which is the other depth-only caller we are used to. I guess they are more or less what they sound like. But, do you have some more summary documentation, I guess from CNVKit in these cases?

dnil commented 2 months ago

Ping on this @mathiasbio! Any list of features you would like to see added? Or ought we primarily look into some specific callers, say CNVkit first, and see if we can find documentation mapping some values to existing ones, like different names for quality, ploidy etc?

mathiasbio commented 1 month ago

Sorry for being so slow to respond here! It seemed like a potentially difficult question. But let's see...

An issue is also that I'm not sure how much a trust many of these values from the CNV callers, and they don't seem to contain the same information. For DellyCNV for instance there's no fold-change value, but in CNVkit there is.

In DellyCNV there's the "CN" for "Integer copy-number" and "RDCN" for for "Read-depth based copy-number estimate" but I don't know how trustworthy they are. I'm torn on whether or not it's good or bad to add more info to these variants. I feel like by uploading variants we're saying that we trust them at least somewhat, but I personally am not so convinced, and by not adding this info we might prevent someone from making the wrong conclusions 😬 I think for DellyCNV we're on the verge of maybe deciding to exclude this variantcaller so maybe we can ignore that for now as well and focus on CNVkit and Ascat.

Apparently there's no foldchange for Ascat either. There's however: TCN = Total copy number MCN = Minor allele copy number

And for CNVkit there's foldchange in the INFO field and sometimes, maybe when foldchange is high enough there's a CN value too. So at least all of these CNV callers have the CN field in common (but not for all variants), and not with the same name.

DellyCNV - CN Ascat - MCN CNVkit - CN

Maybe we need to sit down and look at this some more 🤔

dnil commented 2 weeks ago

Im starting to feel this. We just had a case where it might possibly have saved some time to have it available for an RD case as well - though it will be conflicting and complicated enough to keep an analyst busy also with values. E.g. TIDDIT CN and COV at least, in addition to RR/DR which I think we have in good order already. CNVnator RD and CN.