BorgwardtLab / proteinshake

Protein structure datasets for machine learning.
https://proteinshake.ai
BSD 3-Clause "New" or "Revised" License
99 stars 8 forks source link

cd-hit drops some sequences #141

Closed timkucera closed 1 year ago

timkucera commented 1 year ago

only a few (25 out of 30k), but we should look into this. Temporary fix is to assign those sequences to a -1 cluster, we might also filter them out

timkucera commented 1 year ago

moved to release repo