a-r-j / graphein

Protein Graph Library
https://graphein.ai/
MIT License
1.02k stars 131 forks source link

`ProteinGraphDataset` fails if a single graph construction fails. #345

Open kamurani opened 1 year ago

kamurani commented 1 year ago

When using the ProteinGraphDataset class, if a single graph fails construction in construct_graphs_mp, the graph will be passed to transformation functions as None and this will often cause them to fail.

If None graphs are filtered out of the data_list, then a list index out of range error will result as the loop that saves the torch Data objects, as it saves the filenames from the original pdb list (list of uniprot ids) and is not aware which specific IDs had failed graph constructions.

Expected behavior The ProteinGraphDataset should be robust in handling failed graph constructions and store a list of failed / successful graphs, so that the returned object can be used with valid indexes still (and the user can be aware of which samples in the newly constructed dataset are not able to be used).