Thanks very much for developing TAMA. I'm trying to understand the behaviour of merging duplicate transcript groups. So I run tama_merge.py on a set of 15 transcriptomes from 15 samples. I'm using the following command:
python tama_merge.py -f file_list.txt -p /path/to/output/project -m 5 -a 50 -z 100
I'm intentionally running it without the option -d merge_dup" because I wanted to understand what the behaviour of merging duplicate groups looks like. I'm using permissive 5' and 3' parameters (-a and -z) as well as allow splice junction mismatches (-m) because I wanted to replicate the isoseq collapse default behaviour which I use to collapse transcripts within a single sample. Anyway, as expected TAMA stopped at this stage:
My genes are name with the Isoseq convention of PB.# and transcripts as PB.#.#. I can see, given my parameters, why each of these transcripts belong to their respective groups. If we think of transcripts as nodes in a graph structure, then any two nodes in each transcript group are connected, either directly or indirectly (not sure if this understanding is correct).
But where I'm puzzled is this. Both transcript groups share the transcripts PB.13.195 and PB.21.187. This suggests that both transcript groups are really one and the same, since both graphs (or transcripts) share the nodes PB.13.195 and PB.21.187.
So my question is why are these two transcript groups not considered one group by default? What prevents TAMA from merging all these from the outset? And how were these two transcript groups formed separately in the first place?
Hello
Thanks very much for developing TAMA. I'm trying to understand the behaviour of merging duplicate transcript groups. So I run tama_merge.py on a set of 15 transcriptomes from 15 samples. I'm using the following command:
python tama_merge.py -f file_list.txt -p /path/to/output/project -m 5 -a 50 -z 100
I'm intentionally running it without the option
-d merge_dup"
because I wanted to understand what the behaviour of merging duplicate groups looks like. I'm using permissive 5' and 3' parameters (-a and -z) as well as allow splice junction mismatches (-m) because I wanted to replicate theisoseq collapse
default behaviour which I use to collapse transcripts within a single sample. Anyway, as expected TAMA stopped at this stage:My genes are name with the Isoseq convention of PB.# and transcripts as PB.#.#. I can see, given my parameters, why each of these transcripts belong to their respective groups. If we think of transcripts as nodes in a graph structure, then any two nodes in each transcript group are connected, either directly or indirectly (not sure if this understanding is correct).
But where I'm puzzled is this. Both transcript groups share the transcripts
PB.13.195
andPB.21.187
. This suggests that both transcript groups are really one and the same, since both graphs (or transcripts) share the nodesPB.13.195
andPB.21.187
.So my question is why are these two transcript groups not considered one group by default? What prevents TAMA from merging all these from the outset? And how were these two transcript groups formed separately in the first place?
Thank you very much