ctSkennerton / crass

The CRISPR assembler
http://ctskennerton.github.io/crass
GNU General Public License v3.0
35 stars 11 forks source link

The specificity needs to be improved #13

Closed ctSkennerton closed 12 years ago

ctSkennerton commented 13 years ago

currently when crass is run on metagenomic samples there are a large number of false positive groups are created. some of these groups appear to be SINEs, for example:

GCAACTGGGGCAACTGGGGCAACTGGGGCAACTGGGGCAACTGG

which has an obvious internal repeating structure and is definitely not a CRISPR direct repeat. Other groups have few reads and a low number of nodes in the spacer graph. It should be possible to filter out these groups by removing NodeManagers and DR groups with low numbers of reads and nodes.

ctSkennerton commented 13 years ago

Doesn't look like my solution worked. May need to perform some walking functions to remove groups that have small numbers of connected nodes