arthurstar / cdhit

Automatically exported from code.google.com/p/cdhit
GNU General Public License v2.0
0 stars 0 forks source link

cd-hit-2d produces wrong results if db2 is not sorted by decreasing length #15

Open GoogleCodeExporter opened 8 years ago

GoogleCodeExporter commented 8 years ago
What steps will reproduce the problem?
1.create a set of references
2.pick a few fasta records from the references and save into separate fasta 
file. This file will serve as db2 (= unclustered).
3. Add a smaller fasta record on top of db2

The expected output is that all sequences (except maybe the unreleted first 
record) are added to the existing clusters. Instead, none of them is added.

CD-HIT version 4.6 (built on Nov 28 2012) on linux.

Possible fix (see attached patch)

Original issue reported on code.google.com by logghe.m...@gmail.com on 15 May 2013 at 2:49

Attachments: