aeeckhou / shallowHRD

This method uses shallow Whole Genome Sequencing (sWGS) and the segmentation of a genomic profile to assess the Homologous Recombination Deficiency of a tumor based on the number of Large-scale Genomic Alterations (LGAs).
30 stars 13 forks source link

X = X[which(X[,6] > ((quantile(X[,6])[[4]] - quantile(X[,6])[[2]])/2)),] #3

Closed axiba053 closed 3 years ago

axiba053 commented 3 years ago

Dear developers,

Thank you very much for the useful tool.I can use this tool normally, but I still have a question. I read the article "ShallowHRD: detection of homologous recombination deficiency from shallow whole genome sequencing", and saw the following description:

2.2.1 Workflow

i. CNA cut-off is detected and the profile segmentation is optimized as follows: Segments are defined as ‘large’ if Zi≥(Q1+Q3)/2, where Q1, Q3 are quartiles of Zi (Zi > 3 Mb) distribution.

I am a beginner in R language, I saw the source code X = X[which(X[,6] > ((quantile(X[,6])[[4]] - quantile(X[,6])[[2]])/2)),] at line 746 of the script file.

Should the '-' in source code be '+' ?

Looking forward to your reply !

aeeckhou commented 3 years ago

Hello Axiba,

You are actually right, there is a discrepancy between the two, thank you for pointing it out! The formula inside the script is however right and we let a typo in the article somehow...

The script is however correct and you can use it! Using (Q1+Q3)/2 would fix a minimal size of segment too important and therefore too few segments in some settings ! Of note in the upcoming version of shallowHRD that I'm developping, we let on the side this value based on distribution for another approach that I will detailed in the future, which improve a bit CNA cut-off distribution.

Best regards, Alexandre

axiba053 commented 3 years ago

@aeeckhou Thank you very much for your prompt reply! I'm actually looking for tunable parameters for the tool.

I have 3 known HRD positive cell lines and obtained WGS data(~1x) for them, but only one of the results was positive。

As suggested, I used the following process for analysis:

FASTQ files be aligned to a reference genome (using BWA-MEM) and supplementary & duplicate reads removed from the BAM files(using  samtools and GATK),
then the BAM file be processed by ControlFREEC, finally shallowHRD was used for analysis.

I will test more data and want to know what parameters can be adjusted or what else I can do?

Thanks!

HCC-1143 markdup_summary_plot NCI-1395 markdup_summary_plot NCI-2009 markdup_summary_plot

aeeckhou commented 3 years ago

Hello Axiba,

I didn't programm shallowHRD to be tuned actually but more to adapt to a wide variety of inputs!

The possible tuning that can be done is on the tool generating ratio.txt file for shallowHRD. Using QDNAseq for instance of controlFREEC and also changing the size of the window used. Enlarging windows for your specific profile might be a bit good ! However I'll be honest, changing those parameter might change a little bit your profiles for your sample NCI-1395.markdup but from my experience it will not most likely change much...

How did your characterize your HRD positive cell line and from what type of cancer is it coming ? HR needs to have a deleterious mutation of one allele of BRCA1, BRCA2 or RAD51C and the lost of the second allele or the methylation of the promoter for BRCA1 or RAD51C.

For instance your last profile NCI-2009.markdup can't be optimized more I think. Rarely we observed low level of LGAs for BRCA2 so it might be it ?... Personnaly just with the profile I wouldn't classify as HRD positive at least the NCI-2009.markdup. And I would increase the window size a bit for your case NCI-1395.markdup to look again a second time or/and try QDNAseq.

Keep me posted!

Best regards, Alexandre

axiba053 commented 3 years ago

Dear Alexandre:

I'm sorry to reply you so late. I will try the suggestions you gave and confirm the problems you mentioned. I still have a lot to learn about HRD. Thank you very much !

Best regards Axiba