davidebolo1993 / VISOR

VarIant SimulatOR for short, long and linked reads
GNU Lesser General Public License v3.0
41 stars 11 forks source link

HACk - Index error for reciprocal translocation #15

Closed cdelahaye closed 3 years ago

cdelahaye commented 3 years ago

Hello, First, thank you for your great work with this very interesting and helpful package ! I think I have spotted a little bug in HACk module for "reciprocal translocation".

An example of the error (I ran it with very first bases of E. coli genome): NC_000913.3 8 15 reciprocal translocation h1:NC_000913.3:20:forward:forward 0
Here is what I get (I put spaces between translocated regions for easier reading): AGCTTTT CATTCTGA CTGCA ACGGGCAA TATGTCTCTGTGTGGATTAAAAAAAGAGTGTCT AGCTTTT ACGGGCAA CTGCA CATTCTGA _ATGTCTCTGTGTGGATTAAAAAAAGAGTGTCT The translocation goes well but there is a missing base (noted _) after the second translocation.

I could fix this with few modifications in the source code, in VISOR/HACk/HACk.py:

line 674: lastbase=int(column5[2])+(x.end-x.start) -> lastbase=int(column5[2])+(x.end-x.start+1) line 714: d[column5[0]][column5[1]] = [(firstbase+1, lastbase+1, newtype, transeq,randomseq)] -> d[column5[0]][column5[1]] = [(firstbase+1, lastbase, newtype, transeq,randomseq)] line 718: d[column5[0]][column5[1]].append((firstbase+1, lastbase+1, newtype, transeq,randomseq)) -> d[column5[0]][column5[1]].append((firstbase+1, lastbase, newtype, transeq,randomseq))

davidebolo1993 commented 3 years ago

Hi @cdelahaye,

thanks for reporting. Can you link the E. coli genome used and share your bed with me?

Thanks,

Davide

cdelahaye commented 3 years ago

Sure ! The genome can be found here, I just used it for quick try of your scripts, thus I only used a part of it, and very short bed file.

I forgot to precise in my first message that the first sequence I put is the original genome, and the second one is the HACk output.

hack_issue.zip

davidebolo1993 commented 3 years ago

Hi @cdelahaye,

Thanks a lot. I (correctly) incremented the index in the final dictionary without extracting the entire corresponding sequence though. The quick-fix you proposed works fine (tested with your example). I've just pushed the changes to the master branch.

Thanks,

Davide

davidebolo1993 commented 3 years ago

BTW, let me know if you notice any other bug during your tests.

Thanks,

Davide