BIO-DIKU / SeqScan

Pattern matching in biological sequences
GNU General Public License v2.0
4 stars 0 forks source link

Protein data #63

Closed maasha closed 8 years ago

maasha commented 8 years ago

I have added a protein dataset for benchmarking. It consist of all putative protein sequences in the fungi Aureobasidium pullulans which contain a couple of hydrophobin proteins that can be detected with the following pattern:

[^C]{25,158} C [^C]{5,9} CC [^C]{4,44} C [^C]{7,23} C [^C]{5,7} CC [^C]{6,18} C [^C]{2,13}

We will use this dataset for testing and benchmarking, so it is safe to merge this branch for now.

maasha commented 8 years ago

fixes #49