0xTCG / biser

A fast tool for detecting and decomposing segmental duplications in genome assemblies
MIT License
43 stars 0 forks source link

coordinate overlaps between pairs #18

Closed hrrsjeong closed 1 year ago

hrrsjeong commented 2 years ago

I found that there are SD pairs that overlap genomic coordinates. Also, there are some SDs pairs that have exactly the same coordinate (self-alignment) (examples below)

h2tg000001l     3733143 3748938 h2tg000001l     3743164 3751506 HG00733_hap1.masked:HG00733_hap1.masked 54.2    +       +       13607   13796   232M3D287M17I1M20N46M1D191M29I25N456M9I273M6I16M2I99M3I105M20N4D24S174M1I93M1D45M6D1M7S84M9I7S6I22N125M28S1M28N7M24S24N31M1D13M1I43M2I116M3D183M3D19S19N68M101I39S663I19S458I40S268I23S463I43S199I20S155I30S42I183S399I7S51I23S499I23S103I22S115I29S142I24S150I22S199I22S415I197M2I104M19S19N90M22S22N6M26S1M28D31M317N2M10I312S37M1I67M23I2M23N64M1I29M11D21N4D1M34S227M29S29N150M1I18M68D65S66M7D7S13M5I38M26S1M26N139M20S20N11M1I10M2D315N546M4I210M5D7N8D1M20S238M25S2M23N2D125M3D106M716I20S408I22S126I28S7I24S393I19S369I19S90I22S66I317S107I1M23S104M21S21N232M29S29N149M1D50M23I2M23N136M1I26S27N139M20D20S23M327N4D315S123M24I24N399M4D142M       X=2.9;ID=51.3
h2tg000001l     1336074 1337493 h2tg000001l     1336074 1337493 HG00733_hap1.masked:HG00733_hap1.masked 45.7    +       -       1282    1367    73M1I125M18I31M7D63M13I26M107S25M107N74M17I43M5D58M16I47M8I64M8D49M16D42M23S15M5I22M23N13M7S10M17D15M7N105M13D66M7I30M18D126M1D75M     X=33.3;ID=12.4
inumanag commented 1 year ago

Yes, this is an issue but unfortunately I won't be able to fix it anytime soon (it's not major issue in our pipelines). You can maybe add simple post-processing to filter these out.