Clinical-Genomics / BALSAMIC

Bioinformatic Analysis pipeLine for SomAtic Mutations In Cancer
https://balsamic.readthedocs.io/
MIT License
45 stars 16 forks source link

Feedback from customer regarding the SV page #771

Closed moahaegglund closed 3 years ago

moahaegglund commented 3 years ago

This is feedback from a customer with a wish about what to display on the first page for the SVs. There is also a question regarding filtering of SVs against databases.


Jag sammanfattar här de kolumner jag skulle vilja se i första infosidan för SV
-Effekt: HIGH/MODERATE/MODIFIER/LOW för varje variant från snpEff, indikera också vilka gener ingår i en fusion
-Manta score, för Tumör: SOMATICSCORE, för konstitutionella SV bara SCORE

Sedan
-Inte klippa till de första 30-40 gener för stora SV, åtminstone man vill kunna se gener när man går in på de respektive varianter, annars vet man aldrig vilka gener fanns stora SV
-När man filtrera med hjälp av Swegen och lokala databaser (det skulle vara bra om det står någonstans lite info om hur dessa databaser är byggda och vilka patienter ingår i de), filtrerar ni på exakt start/stop position för SV eller +- några baser?
dnil commented 3 years ago

Hi! Thank you! I think we might need some context for this one to best help. The first two points are partially pipeline specific. What pipeline and version produced the files you are loading, and if this is in Solna, a customer number and case would be nice to look at everything?

The not-cut-genes-for-large SVs is tricky, as some may have thousands. What we can revisit is to show all genes that appear in the default panel or something such.

For the final point, there really two questions, docs and frequency filter SV overlap/clustering:

We can try to document a bit on the Scout side, but the filtering part is done using annotations from the pipeline, so this is mostly an interaction issue with the pipeline, trying to propagate what is in the pipeline reference set to the end user, perhaps with some comment in a fixed field in the vcf header?

The local frequency display is based on loqusdb clustering so employs a variant size proportional imprecision in break pointst matching. The global frequencies displayed for individual variants from the vcf, eg SweGen and GnomAD, is again pipeline dependent. If this is a pipeline that uses SVDB for SV annotation, as most of our in-house ones do, it is using another clustering method.

hassanfa commented 3 years ago

Some good points you got there Daniel 💯

dnil commented 3 years ago

Super! I didn't find any somaticscore on recent balsamic runs, so I'll transfer this issue over to Balsamic for now, and open a new one for Scout regarding the display of gene symbols for large structural variants. We could perhaps do a searchable list or just show the ones from given panels.

hassanfa commented 3 years ago

Manta's tumor-normal analysis has SOAMTICSCORE in the header. Tumor-only mode is missing this value. Since it is a Manta issue and BALSAMIC, I'm closing this.

Scout should parse INFO/SOMATICSCORE if it exists and show it if needed

##INFO=<ID=SOMATIC,Number=0,Type=Flag,Description="Somatic mutation">
##INFO=<ID=SOMATICSCORE,Number=1,Type=Integer,Description="Somatic variant quality score">
##INFO=<ID=SVINSLEN,Number=.,Type=Integer,Description="Length of insertion">
##INFO=<ID=SVINSSEQ,Number=.,Type=String,Description="Sequence of insertion">
##INFO=<ID=SVLEN,Number=.,Type=Integer,Description="Difference in length between REF and ALT alleles">