Nik-Zainal-Group / signature.tools.lib

R package containing useful functions for mutational signature analysis
Other
80 stars 26 forks source link

CNV Input file question #34

Closed disulfidebond closed 2 years ago

disulfidebond commented 2 years ago

Hello, I'm exploring tools for HRD detection and signature analysis for our lab. I tested out HRDetect using hrDetect.R via command line, and I was able to format data successfully from our lab to match the required formatting for SNV, Indels, and SV. The required text tables and VCF files for HRDetect were created from VCF output files from DRAGEN and Manta. When I attempted to modify CNV output from DRAGEN to match ASCAT format for HRDetect, it consistently failed with the error task 1 failed - arguments imply differing number of rows. Verifying the CNV text file column headers were correct and that the rows had inferred tumor and normal copy number values for the corresponding genomic locations resulted in the same error.

When I substituted the provided test_hrdetect_1.cna.txt and test_hrdetect_2.cna.txt example files as the CNV input for HRDetect along with the other modified lab data, the error vanished and the HRDetect workflow completed without errors. Can we use output from a tool like Sequenza and adapt it to the ASCAT format that HRDetect requires? Or is the only option to use Affymetrix data from an array?

Thanks!

andreadega commented 2 years ago

Hi there,

Thanks for trying out our tools. In principle, it should be quite easy to build your copy number file so that it can be used in the hrDetect script. The command line script is just a wrapper for the R function HRDetect_pipeline, and according to our documentation, the text file for CNV should be a TAB separated file and contain a header in the first line with the following columns: 'seg_no', 'Chromosome', 'chromStart', 'chromEnd', 'total.copy.number.inNormal', 'minor.copy.number.inNormal', 'total.copy.number.inTumour', 'minor.copy.number.inTumour'.

Now the only reason I could think for it not to work, is that perhaps you might have formatted the CNV file to have comma separated values (CSV), which is the typical ASCAT output, but that would not work, as TAB separated is necessary.

Affimetrix data from an array si not necessary. All you should need is to have the data formatted as above, which indicates where the segments start and end, as well as the segment tumour/normal copy numbers. The algorithm will just read the segments and count the LOH segments of a certain size to compute the HRD-LOH index.

Let me know if you still have problems with this. BW, Andrea

disulfidebond commented 2 years ago

Hi Andrea, Thanks for the reply. I setup breakpoints in the code and found out what was causing the error. The US and UK spelling differ US:tumor,UK:tumour, and when the code attempted to read the CNV text file with these column header names:

seg_no Chromosome chromStart chromEnd total.copy.number.inNormal minor.copy.number.inNormal total.copy.number.inTumor minor.copy.number.inTumor

it threw the error:

Error in { :

task 2 failed - "arguments imply differing number of rows: 154, 1, 0"

Calls: HRDetect_pipeline -> %dopar% -> <Anonymous>

After completing the dopar loop in the code for HRDetect.R, the value for the HRD score was null (formatting applied by me): completed foreach loop

finished foreach read.table code block

finished foreach read.table code block, result hrd_list is

[[1]] NULL

finished if code block to compute HRD-LOH for samples, data_matrix is del.mh.prop SNV3 SV3 SV5 hrd SNV8
TESTSAMPLE1 0.09756098 43.13171 0 0 NA 160.8142

When I changed the spelling of the headers for the ascat format input CNV file to total.copy.number.inTumour and minor.copy.number.inTumour, the error disappeared and it worked correctly.

Best, John

andreadega commented 2 years ago

Nice, glad it worked. Andrea

ruolin commented 2 years ago

Sorry to comment on a close issue. I have some related questions so I thought it is nice to have them in one thread. I am able to get the ASCAT3.0 to work but the output is not quite the format that HRDetect asks for. I saw the $segments output contains sample chr startpos endpos nMajor nMinor. I cannot fine anywhere it has the copy number in normal samples. Should I just assume that the copy number in the normal samples be 2(total) and 1(minor)?

andreadega commented 2 years ago

In general yes, for the purpose of HRDetect you can set 2 total and 1 minor and it should work.