dauparas / ProteinMPNN

Code for the ProteinMPNN paper
MIT License
1.05k stars 307 forks source link

bias_by_res_dict format for multi-chain design #117

Open johnnytam100 opened 2 weeks ago

johnnytam100 commented 2 weeks ago

Hi @dauparas ! ProteinMPNN is awesome!

Would you mind sharing the bias_by_res_dict format for multi-chain design?

I want to design chains A and C while fixing B, so my bias_by_res_dict looks like

{
  "protein_name_1": {
    "A": [
      [0.0, 0.0, 0.0, 0.0, 0.0, 100.5, 0.0, 0.0, 0.0, 100.5, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0],
      [0.0, 0.0, 0.0, 0.0, 0.0, 100.5, 0.0, 0.0, 0.0, 100.5, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0],
      ...
    ],
    "C": [
      [0.0, -100.5, -100.5, -100.5, -100.5, -100.5, -100.5, -100.5, -100.5, -100.5, -100.5, -100.5, -100.5, -100.5, -100.5, -100.5, -100.5, -100.5, -100.5, -100.5, -100.5],
      [0.0, -100.5, -100.5, -100.5, -100.5, -100.5, -100.5, -100.5, -100.5, -100.5, -100.5, -100.5, -100.5, -100.5, -100.5, -100.5, -100.5, -100.5, -100.5, -100.5, -100.5],
      ...
    ]
  }
}

However, I got this error:

Generating sequences...
---------------------------------------------------------------------------
ValueError                                Traceback (most recent call last)
[<ipython-input-140-a3bec7e6a502>](https://localhost:8080/#) in <cell line: 89>()
    563                 S_sample_list = []
    564                 batch_clones = [copy.deepcopy(protein) for i in range(BATCH_COPIES)]
--> 565                 X, S, mask, lengths, chain_M, chain_encoding_all, chain_list_list, visible_list_list, masked_list_list, masked_chain_length_list_list, chain_M_pos, omit_AA_mask, residue_idx, dihedral_mask, tied_pos_list_of_lists_list, pssm_coef, pssm_bias, pssm_log_odds_all, bias_by_res_all, tied_beta = tied_featurize(batch_clones, device, chain_id_dict, fixed_positions_dict, omit_AA_dict, tied_positions_dict, pssm_dict, bias_by_res_dict)
    566                 pssm_log_odds_mask = (pssm_log_odds_all > pssm_threshold).float() #1.0 for true, 0.0 for false
    567                 name_ = batch_clones[0]['name']

[/content/ProteinMPNN/protein_mpnn_utils.py](https://localhost:8080/#) in tied_featurize(batch, device, chain_dict, fixed_position_dict, omit_AA_dict, tied_positions_dict, pssm_dict, bias_by_res_dict, ca_only)
    368         pssm_log_odds_ = np.concatenate(pssm_log_odds_list,0) #[L,], 1.0 for places that need to be predicted
    369 
--> 370         bias_by_res_ = np.concatenate(bias_by_res_list, 0)  #[L,21], 0.0 for places where AA frequencies don't need to be tweaked
    371 
    372         l = len(all_sequence)

ValueError: setting an array element with a sequence. The requested array has an inhomogeneous shape after 1 dimensions. The detected shape was (215,) + inhomogeneous part.

Therefore, I am suspecting something related to multi-chain design caused the error. For example, should I concatenate the list to span the add-up length of chain A and C? However, it's a dict which has a key e.g. A and C, so I am confused.