Protein data - Githubissues

I have added a protein dataset for benchmarking. It consist of all putative protein sequences in the fungi Aureobasidium pullulans which contain a couple of hydrophobin proteins that can be detected with the following pattern:

[^C]{25,158} C [^C]{5,9} CC [^C]{4,44} C [^C]{7,23} C [^C]{5,7} CC [^C]{6,18} C [^C]{2,13}

We will use this dataset for testing and benchmarking, so it is safe to merge this branch for now.

BIO-DIKU / SeqScan

Protein data #63