etal / cnvkit

Copy number variant detection from targeted DNA sequencing
http://cnvkit.readthedocs.org
Other
545 stars 165 forks source link

Why do I get a extra targets.bed? #737

Closed idoit4fun closed 2 years ago

idoit4fun commented 2 years ago

Hello,

When I run the batch command, I get different targets.bed files, which is differently annotated in gene names. As a result, my pooled references are slightly different each other.

My commands are

1.

  1. I made target and antitarget bed files first, and then trying to make pooled reference with those.

    • cnvkit.py target my_baits.bed --annotate refFlat.txt --split -o my_targets.bed
    • cnvkit.py antitarget my_targets.bed -g data/access-5kb-mappable.hg19.bed -o my_antitargets.bed
    • cnvkit.py batch -n *Normal.bam --output-reference new_reference.cnn -t my_targets.bed -a my_antitargets.bed -f hg19.fasta -g data/access-5kb-mappable.hg19.bed

I realiezed that when I run batch command to build a new reference in second command, it made another target bed file, named "my_targets.target.bed". The differences are like these. image

It seems like "my_targets.target.bed" is repeatably annotated.

It is minor issue, but just wondering.

Thank you in advance.

tetedange13 commented 2 years ago

Hi @idoit4fun ,

-t / --targets parameter of batch subcommand is always intented to receive a BED file containing genomic coordinates of your capture probes => What can be called my_baits.bed, or vendor.bed => It uses it to derive a my_baits.targets.bed and (if not specified by -a / --antitargets parameter) a corresponding my_baits.antitargets.bed

This should explain why your 2nd command produced another my_targets.targets.bed => It is because CNVkit processed it as if it was a "my_baits.bed" and derived a my_targets.targets.bed => And I guess it is "repeatably annotated" because it is not intented to run target subcommand on an "already targetted" file

Differences between "baits" and "targets" (in the sense of CNVkit = baits divided into bins) and where to pass which to which subcommand, can be a bit confusing

The correct way to run CNVkit against your pooled reference is :

  1. Create your pooled reference once, by giving only "Normal.bam" => `cnvkit.py batch --normal Normal.bam --targets my_baits.bed --annotate refFlat.txt --fasta hg19.fasta --access data/access-5kb-mappable.hg19.bed --output-dir results/ => It will write intoresults/everything you need : annotatedmy_baits.{anti,}targets.bed+ areference.cnn` file

  2. Run CNVkit on all your "Tumor.bam", against your newly created pooled reference => Giving batch only your reference.cnn (it will manage to deduce corresponding my_baits.{anti,}targets.bed from it) => `cnvkit.py batch Tumor.bam --reference results/reference.cnn --output-dir results/ --drop-low-coverage`

Hope this helps ! Have a nice day, Felix.

idoit4fun commented 2 years ago

Thank you so much, @tetedange13. Have a great day.