Input Files Required - Githubissues

nicoleversetwo commented 3 years ago

Hi there, I'm just starting out with variant calling, although familiar with other types of whole genome data (methylome/transcriptome). This is a general question on the best way to use this pipeline.

For the input matrix, I have 4/6 types of data required from COSMIC -

2) number of mutations of substitution signature 3 (SNV3) 3) number of mutations of rearrangemet signature 3 (SV3) 4) number of mutations of rearrangemet signature 5 (SV5) 6) number of mutations of substitution signature 8 (SNV8)

I am, however, lacking these columns: 1) proportion of deletions at microhomology (del.mh.prop) 5) HRD LOH index (hrd)

Is it more straightforward to just provide my VCFs (SNV and Indels) and let HRDetect_pipeline calculate the rest? I'm just trying to piece together the documentation from reading the details of the steps here: https://rdrr.io/github/pdiakumis/hrdetect/src/R/HRDetect.R

From what I understand, if I am missing some parameters, it seems if I provide all of these, the function will work?

sample_names <- unique(unlist(sapply(list( SNV_vcf_files, SNV_tab_files, SNV_catalogues, Indels_vcf_files, CNV_tab_files, SV_bedpe_files, SV_catalogues)

andreadega commented 3 years ago

Dear Nicole,

Thanks for using our tools. I would recommend that you check the latest documentation using ?HRDetect_pipeline in your local installation, and that you are using our latest version installed from our github. We tend to update the code every now and then, and the HRDetect_pipeline has been improved a few times with additional options.

The input for HRDetect_pipeline has been designed to allow some flexibility. If you are happy with your estimates of SNV3, SNV8, SV3 and SV5, then all you are missing, as you have pointed out, are del.mh.prop and HRD-LOH. The HRDetect_pipeline function can estimate these for you and add them to the input matrix, all you need to do is to provide the indels vcf files (or indels tab files) for estimating del.mh.prop, and the copy number files for estimating HRD-LOH. The format of the HRD-LOH files is described in the documentation when typing ?HRDetect_pipeline.

In your example at the end, you are missing the copy number files, while you are specifying multiple SNV or SV files, which are not needed. If you want HRDetect_pipeline to compute the catalogues and estimate signatures for you, then just specify the vcf or tab files. If you already have the mutational catalogues, then you don't need to provide the variants, just the catalogues will be enough to then compute the signature exposures. If you already have the signature exposures, then just add them to the input matrix.

Let me know if you have other questions, Andrea

andreadega commented 2 years ago

Closed for inactivity.

Nik-Zainal-Group / signature.tools.lib

Input Files Required #32