aeeckhou / shallowHRD

This method uses shallow Whole Genome Sequencing (sWGS) and the segmentation of a genomic profile to assess the Homologous Recombination Deficiency of a tumor based on the number of Large-scale Genomic Alterations (LGAs).
30 stars 13 forks source link

Comment doesn't seem to match the code when merging small segments with larger ones #12

Closed lbeltrame closed 1 year ago

lbeltrame commented 1 year ago

I've found that one of the comments doesn't seem to match with what the code is doing (QDNAseq linked but the controlFREEC lines are the same).

https://github.com/aeeckhou/shallowHRD/blob/341f0fe0e645757b6596b34de5df5cd194703f9c/shallowHRD_hg19_1.13_QDNAseq_no_chrX.R#L1348

The code is quite hard to read, but the comment says

########## 4 - Inside segments end after but before next segment

However the condition doesn't look like that to me:

 && tmp_0.1_3mb[i,5] <= tmp_3mb[c+1,5]){    ########## 4 - Inside segments end after but before next segment

To me, it looks like the end (column 5) of the shorter segments needs to be smaller, or equal than the end of the longer segment (column 5, too). So it's not really "before" the next segment. It may happen by chance if the spacing between the segment and the subsequent one is larger than 3Mbp (given that it's the maximum the short segments can be), but at least potentially, the shorter segment being considered in tmp_0.1_3mb can overlap partially the next segment.

Am I reading the code wrong, or should it be && tmp_0.1_3mb[i,5] < tmp_3mb[c+1,4]), or is the comment referencing something else?

aeeckhou commented 1 year ago

Hello,

No you are reading it right ! It's been 4 years since I wrote this part but indeed I think it should be : && tmp_0.1_3mb[i,5] < tmp_3mb[c+1,4])

However I looked at many many sWGS, and didn't see mistakes like overlapping segments in the final output and so on.

I might be wrong here but from the top of my mind the reason it went unnoticed might lies in the input format of a ratio file. If the c+1 large segment exists and is "formed", then the beginning of this large segment is in the ratio file, hence all small segments before this c+1 segment are by nature not overlapping its beginning. Another possibility is that this is somehow corrected afterwards by "chance" and I just didn't see it.

Did you look at final outputs to see any mistakes that could be caused by this?

I finished my PhD in my former team where I developped shallowHRD 1.5 years ago. Since then they mooved forward and developped shallowHRD v.2, that should be published this year. I am trying still to answer question on this algorithm, but won't modify any code as it was for us robust and satisfying in the outputs. If you have mistakes in your output I'll mentionned this to my former team and discuss with them to modify the code accorddingly.

If on your own you see that your correction improves the output don't hesitate to post it under my comment aswell!

Best regards, Alexandre

aeeckhou commented 1 year ago

Hello Lucas,

Did you investigate more this issue ? Should I leave it opened ?

Best regards, Alexandre