etal / cnvkit

Copy number variant detection from targeted DNA sequencing
http://cnvkit.readthedocs.org
Other
502 stars 163 forks source link

create reference from affected only and no bait files #788

Open kmarianski opened 1 year ago

kmarianski commented 1 year ago

Hi folks,

Please help me out here. I'm trying to call CNVs on my WES data but I can't seem to be able to create a reference file. I only have affected samples and I don't have any bait .bed or .cnn files from the company that sequenced the samples.

Thanks, Krzys

etal commented 1 year ago

It would be best to get the bait/target coordinates or the name of capture kit the provider used, if you can. With the kit name alone you could get the coordinates from the manufacturer's website (e.g. Illumina provides them).

Failing all that, you could try the script guess_baits.py to infer the coordinates: https://cnvkit.readthedocs.io/en/stable/scripts.html

As a starting point, you can try the exome targets here (may need to lift over to hg38): https://github.com/etal/cnvkit/tree/master/data

Then, once you have some capture coordinates, you can use batch -n with no following files to run the default workflow with a "flat" reference. Details here: https://cnvkit.readthedocs.io/en/stable/pipeline.html#with-no-control-samples

If these WES samples are from tumors then that may be accurate enough to find some large-scale CNAs, but be cautious with the results.