aeeckhou / shallowHRD

This method uses shallow Whole Genome Sequencing (sWGS) and the segmentation of a genomic profile to assess the Homologous Recombination Deficiency of a tumor based on the number of Large-scale Genomic Alterations (LGAs).
30 stars 13 forks source link

different LGA results between versions #7

Closed danyuewang closed 1 year ago

danyuewang commented 2 years ago

Hi,aeeckhou!

Thanks for creating this tool! I have noticed that the lastest version updated some algorithms. However, I have some questions and confusions about the final results.

  1. the results between the old and new versions displayed big difference in some samples as follow, the difference may came from the calulation of CNA cut-off value, can you introduce the algorithm and should we always choose the latest result? old version result: 图片

    new version result: 图片

  2. the QDNAseq and controlFREEC segmentation inputs showed different result in our some samples, which running in recommend parameters(50kb windows for QDNAseq , 40kb windows for controlFREEC). And I see you recommended the QDNAseq input now. Could you help to explain the confusions ?

QDNAseq input result: 图片

FREEC input result: 图片

Thanks in advance,

Kind regards, danyue

aeeckhou commented 1 year ago

Dear Danyue,

So sorry for the delayed answer, I did not get any notifications in my mail of your issue and did not think of checking directly the github until now.

For the question 1 :

In the old version that you ran, sometimes the density plot, the first minima detected with it and the CNA cut-off value are poorly/wrongly detected, impacting very badly the following optimization of the segmentation and the final results. This is what happens in your first graphe : CNA cut-off value detected at 1.441. To try to mitigate this, I capped CNA cut-off in this old version to 0.45 (CNA cut-off corrected in the table) - please note that it is not ideal at all, but can still help a little the user. This capping to 0.45 of the CNA cut-off allowed you to detect some LGAs, but you still miss some.

The new version of shallowHRD takes longer time to proceed but was build to improve general optimization of the segmentation, including CNA cut-off detection, notably by a random sampling & comparison of between large segments but also by a two-step optimization. The detection of CNA cut-off is much more robust. Here in the newest version you have a correct CNA cut-off detection. Some other improvements include recalculating the value when merging several segments and some other correction of small mistakes. It is safe to say that the last version is better than the old one. I would still advised to look closely at your final segmentation, including using the "zoomed_plot" of the output when your scale in plot A goes as high as you have [-5,5].

For the question 2 :

This sample have a flat profile, which could either indicate a low (too low here) tumor content or a germline sample. The genomic profile that you look at are actually quite similar, it is just that the scale is different between the two of them. Concerning the final segmentation, here shallowHRD with ControlFREEC detects a very small minima in the density, around 0.04, that comes most likely from small variation in the segmentation from ControlFREEC, not actual biological variation and copy number change. I would not interpret either of those segmentation, classifying them as "low cellularity or germline". As an indicator, MAX2 represent the distance between segment and can be a correct surrogate marker of cellularity (~15% mistakes still). It should be above 0.16 with QDNAseq. MAX2 DOES NOT ALWAYS WORK AND SHOULD ONLY BE USED TO HELP YOUR EYES.

For the choice of QDNAseq it is purely empirical, I tested shallowHRD on way more sample starting from a point, and I optimized more with the segmentation outputed from QDNAseq.

Hope that help you, Sorry for the delay I missed completly your issue.

Best regards, Alexandre Eeckhoutte