getzlab / deTiN

DeTiN is designed to measure tumor-in-normal contamination and improve somatic variant detection sensitivity when using a contaminated matched control.
BSD 3-Clause "New" or "Revised" License
49 stars 21 forks source link

n_probs argument requirement in aSCNA segmentation file: #26

Closed heyudou closed 4 years ago

heyudou commented 4 years ago

Hello!

Thanks for developing DeTiN to provide a solution in this very important bioinformatics problem!

I am writing to ask about the n_probs argument n the aSCNA segmentation file. It is not mentioned as one of the required inputs in the wiki: https://github.com/broadinstitute/deTiN/wiki/Description-of-inputs But we don't seem to be able to run the program without it. My questions are:

  1. If we do not have information on exome capture kit for the sequencing files, but we have generated f and tau from ASCAT, can we still run DeTin?
  2. If n_probe is absolutely required, we are able to get the number of properly covered (heterozygous) SNP loci in each segment through ASCAT, , which seems to correlated with n_probe, can we use this value somehow?
  3. How can we use DeTin on WGS data?
amarotaylor commented 4 years ago

Hey I'm no longer able to edit deTiN's code due to moving institutions but I can answer these questions no problem.

If you dont have the number of probes thats totally fine and you should be able to run deTiN.

You have two options (1) you can "fake" this column by providing a large constant value for every segment or (2) you can estimate this value by dividing the length of your segments by 100 or some other appropriate value for the average distance between your probes. This assumes even spacing throughout the genome which is obviously incorrect. But maybe close enough. These fixes should let you run with ASCAT answering 1. You could just scale the value you mention in (2) not sure what makes sense. 3) For WGS data we faked the probes using strategy (2) length/100= n_probes.