adaptyvbio / ProteinFlow

Versatile computational pipeline for processing protein structure data for deep learning applications.
https://adaptyvbio.github.io/ProteinFlow/
BSD 3-Clause "New" or "Revised" License
176 stars 8 forks source link

classes_to_exclude is not filtering proteins correctly #133

Closed ardagoreci closed 5 months ago

ardagoreci commented 5 months ago

Hi @elkoz,

I noticed while visualizing the proteins that setting the classes_to_exclude method is not properly filtering the required classes. It runs with no issues, but it still returns proteins that are homomeric or heteromeric. It might be removing some of the homomers/heteromers, but I haven't tested if it does.

Here is the code that I used for filtering and a few of the resulting proteins:

Screen Shot 2024-01-31 at 01 08 55 Screen Shot 2024-01-31 at 01 08 21
elkoz commented 5 months ago

Just in case, also keep in mind that this is filtering based on the biounit level. It is possible (and pretty common) that e.g. a heteromer protein has a single chain biounit, such biounits won't be filtered out by this.