Closed lindenb closed 9 years ago
Hi. Those two different extensions are confusing, You're right. We have modified them to be _pl\1 and \1. We've made and comitted the change. The difference between them is quite subtle and you can ignore it for the most part. Imagine you have both parts of the mate pair in a structure like:
read MATE1linkerMATE2
In this case the program would generate two sequences:
read\1 MATE1 read\2 MATE2
Now imagine that because of a sequencing error or a problem during the ligation the linker is not complete:
read MATE1linkeMATE2
In this case the program would generate two sequences:
read\1 MATE1 read\2 ATE2
As you see the splitter suspects that there's a problem in the ligation site between the linker and the second half and tries to be conservative be removing the few nucleotides that would complete the linker length even if those nucleotides do not match the linker sequence. In this case we introduce the _pl to mean that the linker match was only partial (partial linker). There are orphans because in some reads the linker could be found at the very begining or end or because it wasn't found at all. In those cases you don't get two fragments, but just one. If you want to get the orphans in a different file you could use pair_matcher after you've finished split_matepairs. Finaly, and just to cover all cases you could also find some _mlc. Those appear when the linker is found more than once.
Thanks !
Hi, have been asked to use split_matepairs to split sequences produced by a ion_torrent. When I look at the reads, here are the names I see:
why two extensions for the names
_pl.part1
or\1
? why are there some orphan reads ?Thanks for your help.
P.