dauparas / ProteinMPNN

Code for the ProteinMPNN paper
MIT License
910 stars 278 forks source link

Design complexes with unknown chains (proposed fix included) #86

Open SimonCrouzet opened 6 months ago

SimonCrouzet commented 6 months ago

Hello,

Thank you for your work on ProteinMPNN!

I encountered a bug when I was trying to run the pipeline on a sample with several short unknown chains (like 'P' = 'XXXX'). Those chains were not part of the chains I wanted to design, but I still had to remove them from my chain_id_dict and fixed_positions_dict to avoid a subsequent bug (where the pipeline was trying to compile a score from an empty sequence). However, removing those, I encountered another bug from protein_mpnn_utils.py: at line 381, omit_AA_mask[i,] = omit_AA_mask_padwas raising an error due to a mismatch between shapes ( (S, 21) against (S, 56), with S the length of the full sequence).

I realized the bug was coming from the line 378, where omit_AA_mask_pad = np.pad(np.concatenate(omit_AA_mask_list,0), [[0, L_max-l]], 'constant', constant_values=(0.0, )) was expanding the 2D array of L_max - l on both dimensions, while the second one has to be constant. I then modified the line to be omit_AA_mask_pad = np.pad(np.concatenate(omit_AA_mask_list,0), [[0, L_max-l],[0, 0]], 'constant', constant_values=(0.0, )).

I created a pull request, see #87

Hope this report can be of any use, Best,