BoevaLab / FREEC

Control-FREEC: Copy number and genotype annotation in whole genome and whole exome sequencing data
153 stars 49 forks source link

Error: your BED file with coordinates of targeted regions may contain duplicates Check chromosome 1 #111

Closed nbiqgi closed 2 years ago

nbiqgi commented 2 years ago

Whole Exome Sequencing data

If you get this error after you already sorted the bed file and removed duplicates with

sort -k1,1V -k2,2n -k3,3n -u input.bed > input_sorted_nodups.bed

This is because you need to merge the overlapped regions with

bedtools merge -i input_sorted_nodups.bed >> input_sorted_nodups_merged.bed

valeu commented 2 years ago

Could you please share the complete output into the command line? Thank you!

valeu commented 2 years ago

Dear Natalia, Unfortunately, currently you need to provide a sorted .bed file to FREEC. (sorted first by chromosome and then by position). Your file is not sorted perfectly. E.g. it contains:

chr1 498104 498409 ens|ENST00000599771,ens|ENST00000432964,ens|ENST00000423728,ens|ENST00000440038,ens|ENST00000601486 1 - chr1 493690 495074 ref|LOC100132062,ref|LOC100133331,ref|LOC100132287,ref|NR_028325,ref|NR_028322,ref|NR_028327,ens|ENST00000419160,ens|ENST00000423728,ens|ENST00000440038,ens|ENST00000425496,ens|ENST00000455464,ens|ENST00000601814 1 -

where coordinates in the second line are smaller than coordinates in the first line.

If you sort your bed file, I hope FREEC will work.

Best Valentina