dellytools / delly

DELLY2: Structural variant discovery by integrated paired-end and split-read analysis
BSD 3-Clause "New" or "Revised" License
424 stars 136 forks source link

CNV calling #223

Open oghzzang opened 3 years ago

oghzzang commented 3 years ago

Dear @tobiasrausch,

Hi. I'm Oh.

I executed the delly for Germline CNV calling.

I want to ask about the genotype of my called CNVs (c1.cnv.bcf), or merge the calls (merged.bcf).

ref_fasta=${home_dir}/Reference/Homo_sapiens_assembly38.fasta
map_file=${home_dir}/Tools/delly/map/Homo_sapiens.GRCh38.dna.primary_assembly.fa.r101.s501.blacklist.gz

### call CNV
delly cnv \
    -o c1.cnv.bcf \
    -g ${ref_fasta} \
    -m ${map_file} \
    ${sample_name}_recal.bam 

### Merge CNV into a unified site list
code not shown

### Genotype CNVs for each samples
code not shown

### Merge genotype using bcftools
bcftools merge -m id -O b -o merged.bcf c1.geno.bcf c2.geno.bcf ... c100.geno.bcf 

### Filter for germline CNVs
delly classify -f germline -o filtered.bcf merged.bcf

All genotype of CNVs is "./." as below.

image

As I know, "./." is non-call.

Anyway, I only have to use "RDCN", so does it matter?

Many thanks.

Oh.

tobiasrausch commented 3 years ago

Yes, please use CN or RDCN. The long answer is:

For copy-number variants delly is currently not using the GT field because that's commonly used for hom. ALT (1/1), het. (0/1) and hom. REF (0/0). For copy-number variants I do not know the allelic distribution. For instance if the total copy-number of a segment is 8 the allelic copy-numbers could be 4 and 4, or 8 and 0, or 1 and 7, ...

Because of that issue delly only outputs the total copy-number in FORMAT:CN and the copy-number likelihoods for each copy-number state (FORMAT:CNL).

oghzzang commented 3 years ago

Dear @tobiasrausch,

Hi I'm Oh.

Thanks for your reply.

Have a nice day!

Oh.

marissa97 commented 1 year ago

Hey, I executed the same commands. However, it says, that all my samples data has low coverage, therefore I need to increase the scanning window.

I do with with "-w 50000", then it works. However all the variants are "N".

Do these Ns represent long bases or it represents "N" base ?

If it's the second one, how do i exclude this reference "N" bases?

And when i do

delly classify -f germline -o filtered.bcf c1.bcf

The result is empty. What did i do wrong?

Thankyou

tobiasrausch commented 1 year ago

I am sorry, I still need to fix the N reference nucleotides. That's on the ToDo list. Just rely on POS and INFO/END for the size of the CNV and FORMAT/CN shows the estimated copy-number. The classify subcommand requires a multi-sample BCF file.

marissa97 commented 1 year ago

Thankyou for your response. I still have some questions. As I want to find the germline CNVs for each one sample. Is this possible ? Why do we need multiple samples? I haven't found any tools that do this. Some of the tools need multiple-sample, such as DELLY.

On Wed, Oct 26, 2022 at 8:37 PM Tobias Rausch @.***> wrote:

Reopened #223 https://github.com/dellytools/delly/issues/223.

— Reply to this email directly, view it on GitHub https://github.com/dellytools/delly/issues/223#event-7675414319, or unsubscribe https://github.com/notifications/unsubscribe-auth/APSYBB23MAYPZFC2EBIX2XTWFF25XANCNFSM4WYPU3LA . You are receiving this because you commented.Message ID: @.***>