0xTCG / biser

A fast tool for detecting and decomposing segmental duplications in genome assemblies
MIT License
43 stars 0 forks source link

core region length in the elem file #13

Closed hrrsjeong closed 2 years ago

hrrsjeong commented 2 years ago

Hello! I found that many elements with the same element id have different (core) lengths. Below, I copied first 10 lines of the elem file. As you can see, although 2nd and 3rd lines share the same core index of 1000_0 their length is very different. Also, I was wondering why the 6th column (core length) is not the same as 4th column - 3rd column +1.

HG00733_hap1.masked     h2tg000076l     3740761 3741639 1000_0  636     637     +       CORE
HG00733_hap1.masked     h2tg000085l     7729687 7730381 1000_0  357     140     -       CORE
HG00733_hap1.masked     h2tg000179l     1853817 1854097 1000_0  280     126     +       CORE
HG00733_hap1.masked     h2tg000076l     3741641 3742551 1000_1  450     450     +
HG00733_hap1.masked     h2tg000179l     1854190 1854642 1000_1  377     87      +
HG00733_hap1.masked     h2tg000017l     99364420        99366012        1002_0  1289    1286    -       CORE
HG00733_hap1.masked     h2tg000085l     5082532 5085049 1002_0  1217    158     +       CORE
HG00733_hap1.masked     h2tg000004l     85363675        85365239        1002_1  826     458     +
HG00733_hap1.masked     h2tg000085l     4941770 4943670 1002_1  935     937     +
HG00733_hap1.masked     h2tg000004l     85386073        85396709        1002_2  4768    1830    +
inumanag commented 2 years ago

Hi @hrrsjeong

Yeah, cores can have different lengths (e.g., 1000_0 in this case) due to the gaps/dissimilarities between core copies.

The sixth column should be the same... that is certainly a bug. Do you mind sharing the file that causes this problem (email is also OK)?

inumanag commented 2 years ago

Fixed in 1.3--- thank you for the report!