abpatel2288 / cdhit

Automatically exported from code.google.com/p/cdhit
GNU General Public License v2.0
0 stars 0 forks source link

-s2 and -S2 options ignored? #6

Closed GoogleCodeExporter closed 9 years ago

GoogleCodeExporter commented 9 years ago
What steps will reproduce the problem?
1. Run cd-hit-2d or cd-hit-est-2d where a sequence in -i is a subsequence of a 
sequence in -i2 and options -s2 1.0 and -S2 999999.

What is the expected output? What do you see instead?
Sequence in -i2 should be clustered into the subsequence in -i.  Instead, it 
behaves as if -s2 and -S2 are defaults, which requires that -i2 sequences be 
equal or shorter than -i sequences.

What version of the product are you using? On what operating system?
Version: cd-hit-v4.6.1-2012-08-27
OS: Linux 64-bit 2.6.18-238.el5

Please provide any additional information below.
I've broken it down to a single sequence in -i and a single subsequence of the 
one in -i in -i2.  I have tried both cd-hit-2d and cd-hit-est-2d.  They both 
seem to ignore the -s2 and -S2 options.
Thanks.

Original issue reported on code.google.com by vanhemer...@gmail.com on 4 Sep 2012 at 9:17

GoogleCodeExporter commented 9 years ago
They are not ignored. But I did found and fixed a minor bug not related to 
these two options, but related to cd-hit-2d and cd-hit-est-2d. Not sure if that 
bug caused problem for you, but it works fine for me now with testing datasets 
similar to what you described.

The fix has been push to the repository, you can check it out and see if it 
works for you.

Original comment by phooli...@gmail.com on 13 Sep 2012 at 9:48

GoogleCodeExporter commented 9 years ago
Thanks for your reply.  I cloned the source trunk and compiled it, then reran 
my test.  I get the same result.  Am I doing something wrong or 
misunderstanding the options?  Attached are my test nucleotide sequences (one 
is a subsequence of the other).  Here are my commands/results:

Works as expected:

cd-hit-est-2d -r 1 -i cdk.fa -i2 cdk_subseq.fa -o test.out -c .95 -d 99 -p 1 -g 
1 -s2 0.0 -S2 99999;cat test.out.clstr

>Cluster 0
0       1267nt, >gi|300863097|ref|NM_000077.4|... *
1       840nt, >gi|300863097|ref|NM_000077.4|subseq... at 1:840:1:840/+/100.00%

If I switch -i,-i2, I expect the same result, but it does not cluster:

cd-hit-est-2d -r 1 -i2 cdk.fa -i cdk_subseq.fa -o test.out -c .95 -d 99 -p 1 -g 
1 -s2 0.0 -S2 99999;cat test.out.clstr

>Cluster 0
0       840nt, >gi|300863097|ref|NM_000077.4|subseq... *

Thanks very much for your help!

Original comment by vanhemer...@gmail.com on 14 Sep 2012 at 3:17

Attachments:

GoogleCodeExporter commented 9 years ago
Thank you for providing testing datasets for reproducing the bug. With these 
datasets, I did track down the bug that caused the problem, and it turns out 
these two options were not properly handled. Now it is fixed and pushed to the 
repository. Thank you again.

Original comment by phooli...@gmail.com on 14 Sep 2012 at 11:12