KosinskiLab / AlphaPulldown

https://doi.org/10.1093/bioinformatics/btac749
GNU General Public License v3.0
176 stars 39 forks source link

'ranking_confidence' is not in the result.pkl file after the update #367

Closed DimaMolod closed 2 weeks ago

DimaMolod commented 2 weeks ago

I think after this line: https://github.com/KosinskiLab/AlphaPulldown/blob/main/alphapulldown/folding_backend/alphafold_backend.py#L542 there is no 'ranking_confidence' in the result.pkl file, after recalculate_confidence

result_pkl.keys()
Out[4]: dict_keys(['distogram', 'experimentally_resolved', 'masked_msa', 'num_recycles', 'predicted_aligned_error', 'predicted_lddt', 'structure_module', 'plddt'])
DimaMolod commented 2 weeks ago

there is also no 'ptm' and 'iptm' scores, the way it was: dict_keys(['distogram', 'experimentally_resolved', 'masked_msa', 'predicted_aligned_error', 'predicted_lddt', 'structure_module', 'plddt', 'aligned_confidence_probs', 'max_predicted_aligned_error', 'ptm', 'iptm', 'ranking_confidence'])

dingquanyu commented 2 weeks ago

there is also no 'ptm' and 'iptm' scores, the way it was: dict_keys(['distogram', 'experimentally_resolved', 'masked_msa', 'predicted_aligned_error', 'predicted_lddt', 'structure_module', 'plddt', 'aligned_confidence_probs', 'max_predicted_aligned_error', 'ptm', 'iptm', 'ranking_confidence'])

I see. It seems 'seqs' was never here anymore. I think create_notebook.py still has to be updated.

dingquanyu commented 2 weeks ago

Actually, I rerun it again and 'seqs' is among the keys ['distogram', 'experimentally_resolved', 'masked_msa', 'predicted_aligned_error', 'predicted_lddt', 'structure_module', 'plddt', 'seqs', 'unrelaxed_protein', 'aligned_confidence_probs', 'max_predicted_aligned_error', 'ranking_confidence']

DimaMolod commented 2 weeks ago

Actually, I rerun it again and 'seqs' is among the keys ['distogram', 'experimentally_resolved', 'masked_msa', 'predicted_aligned_error', 'predicted_lddt', 'structure_module', 'plddt', 'seqs', 'unrelaxed_protein', 'aligned_confidence_probs', 'max_predicted_aligned_error', 'ranking_confidence']

hmm I just checked a random pkl from the prediction made today using this branch and there is no 'seqs', must be some recent fix?.. ['distogram', 'experimentally_resolved', 'masked_msa', 'num_recycles', 'predicted_aligned_error', 'predicted_lddt', 'structure_module', 'plddt']

DimaMolod commented 2 weeks ago

but I didn't update alphafold submodule, so if you introduced some changes there this could be the reason

dingquanyu commented 2 weeks ago

Interesting. Perhaps it's better to update create_notebook.py so that it parses sequences from PDB files directly anyway

DimaMolod commented 2 weeks ago

Yes, you can do this, but we need to have ptm and iptm (as well as the rest of the missing keys) in the result.pkl anyways

dingquanyu commented 2 weeks ago

ah I found where it came from. It's not from the recalculations of metrics but rather here: https://github.com/KosinskiLab/AlphaPulldown/blob/b791e1b65cff7583cd28c9fcb08e188a46db077e/alphapulldown/folding_backend/alphafold_backend.py#L397-L403 the pickle was dumped before the dictionary got updates on the missing keys.

dingquanyu commented 2 weeks ago

Plus the calculation of iptm iptm+ptm etc were moved to post processing after your restructuring

DimaMolod commented 2 weeks ago

Good catch!