decodebiology / interproscan

Automatically exported from code.google.com/p/interproscan
0 stars 0 forks source link

ips5rc4 combines identical sequences in tsv-output #17

Closed GoogleCodeExporter closed 9 years ago

GoogleCodeExporter commented 9 years ago
When annotating multiple identical sequences with different IDs, the domains 
hits are reported using concatenated (with "|") IDs.

Intuitively IPS should report them separately for each input ID.

See the attached files for a simple example.

Original issue reported on code.google.com by holgerbr...@gmail.com on 11 Mar 2013 at 1:12

Attachments:

GoogleCodeExporter commented 9 years ago
The problem extends also to multiple (more than 2) occurrences of the same 
sequence. In such cases the concatenated IDs in the output TSV  just gets 
longer. Example:
comp192256_c0_seq11_2_chunk10|comp192256_c0_seq21_3_chunk9|comp192256_c0_seq28_2
_chunk9|comp192256_c0_seq22_3_chunk9|comp192256_c0_seq2_3_chunk2|comp192256_c0_s
eq10_3_chunk2|comp192256_c0_seq6_2_chunk10

Original comment by holgerbr...@gmail.com on 11 Mar 2013 at 2:06

GoogleCodeExporter commented 9 years ago
Hi Holger,
There is a reason why decided to change that. I just have to remind myself and 
we will come back to you as soon as possible.
Thanks,
Maxim 

Original comment by Maxim.Sc...@gmail.com on 12 Mar 2013 at 10:37

GoogleCodeExporter commented 9 years ago
Hi Holger

We have changed the code now so that it behaves the way you suggested. The next 
interproscan release will report each ID separately.

Thanks,
Craig

Original comment by newboycr...@gmail.com on 21 Mar 2013 at 5:21

GoogleCodeExporter commented 9 years ago
Fixed.

Original comment by Maxim.Sc...@gmail.com on 26 Mar 2013 at 5:08