Closed mavericb closed 2 months ago
parameters available in current filename:
Parameter | Value | Explanation |
---|---|---|
run | 1_0 | Run or iteration number |
T | 0.075 | Temperature, likely used in a simulation or optimization process |
seed | 111 | Seed for random number generation, ensures reproducibility |
num_res | 44 | Number of residues in the protein or complex |
num_ligand_res | 0 | Number of ligand residues (in this case, none) |
use_ligand_context | True | Indicates whether ligand context was considered in the analysis |
ligand_cutoff_distance | 8.0 | Cutoff distance (in Angstroms) for ligand interactions |
batch_size | 1 | Size of the batch used in processing |
number_of_batches | 5 | Total number of batches processed |
model_path | . | Model path (in this case, the current directory) |
model_params | ligandmpnn_v_32_010_25.pt | Name of the model parameters file |
unrelaxed | - | Indicates the structure has not undergone relaxation |
alphafold2_multimer_v3 | - | Version of AlphaFold used (multimer v3) |
model | 1 | Specific model number used |
seed | 000 | Another seed, likely used in a different phase of the process |
Not only is the separator ___ never present, but also "rank" is never present, and the code always skips the file since:
if 'rank' not in file_name: if print_results: print(f"Skipping {file_name} as it does not contain 'rank' in the file name.") return None
I am sorry for the issue related to the file naming convention.
I am not familiar with ligandmpnn-derived file names, but a temporary way you can try is renaming the files temporarily to be compatible with LIS calculation. Once the calculation is done, you can convert them back to their original names.
The LIS calculation can be done at the folder level regardless of the progress of the ColabFold prediction. When AlphaFold-Multimer is in progress, there can be a mix of ranked files (finished) and temporary files (not finished). The current code is designed to calculate finished predictions that have "ranked" in the file name.
Here is a temporal approach you can use:
Create a CSV or TSV file: This file should contain the original names of your JSON and PDB files. Add columns for the new, calculation-compatible names (with "rank").
Make a bash Script for Renaming based on the CSV or TSV file (you can use chatgpt to make custom bash script):
Revert the File Names:
Hey there, thanks so much for your answer!! I renamed the files using the convention suggested.
target___candidate_1_rank_001.pdb target___candidate_1_rank_003.pdb target___candidate_1_rank_005.pdb
target___candidate_1_rank_002.pdb target___candidate_1_rank_004.pdb
In particular, I created a script to rank by average pLDDT gained from files, but nothing is generated, even though I don't get any errors. I think the problem is that the naming convention is still not compatible. In fact, I get:
Debug: protein_2_temp: .pdb Debug: protein_1: target, protein_2: candidate_1_rank_003, pae_file_name: renamed+target___candidate_1_rank_003_pae.png Debug: Rank: Not Available
Do you have an example of a correct filename that I can transform my files into?
Thanks so much for your time and help
Try these.
protein_1_protein_2_unrelaxed_rank_001_alphafold2_multimer_v3_model_3_seed_000.pdb protein1protein_2_scores_rank_001_alphafold2_multimer_v3_model_5_seed_000.json protein_1_protein_3_unrelaxed_rank_001_alphafold2_multimer_v3_model_5_seed_000.pdb protein1protein_3_scores_rank_001_alphafold2_multimer_v3_model_1_seed_000.json protein_1_protein_4_unrelaxed_rank_001_alphafold2_multimer_v3_model_1_seed_000.pdb protein1protein_4_scores_rank_001_alphafold2_multimer_v3_model_4_seed_000.json protein_1_protein_5_unrelaxed_rank_001_alphafold2_multimer_v3_model_4_seed_000.pdb protein1protein_5_scores_rank_001_alphafold2_multimer_v3_model_3_seed_000.json protein_1_protein_6_unrelaxed_rank_001_alphafold2_multimer_v3_model_3_seed_000.pdb protein1protein_6_scores_rank_001_alphafold2_multimer_v3_model_3_seed_000.json protein_1_protein_7_unrelaxed_rank_001_alphafold2_multimer_v3_model_3_seed_000.pdb protein1protein_7_scores_rank_001_alphafold2_multimer_v3_model_3_seed_000.json protein_1_protein_8_unrelaxed_rank_001_alphafold2_multimer_v3_model_3_seed_000.pdb protein1protein_8_scores_rank_001_alphafold2_multimer_v3_model_3_seed_000.json protein_1_protein_9_unrelaxed_rank_001_alphafold2_multimer_v3_model_3_seed_000.pdb protein1protein_9_scores_rank_001_alphafold2_multimer_v3_model_3_seed_000.json protein_1_protein_10_unrelaxed_rank_001_alphafold2_multimer_v3_model_3_seed_000.pdb protein1protein_10_scores_rank_001_alphafold2_multimer_v3_model_1_seed_000.json protein_1_protein_11_unrelaxed_rank_001_alphafold2_multimer_v3_model_1_seed_000.pdb protein1protein_11_scores_rank_001_alphafold2_multimer_v3_model_3_seed_000.json
On Aug 6, 2024, at 12:53 AM, mavericb @.***> wrote:
Hey there, thanks so much for your answer!! I renamed the files using the convention suggested.
target_candidate_1_rank_001.pdb target_candidate_1_rank003.pdb targetcandidate_1_rank_005.pdb target_candidate_1_rank002.pdb targetcandidate_1_rank_004.pdb In particular, I created a script to rank by average pLDDT gained from files, but nothing is generated, even though I don't get any errors. I think the problem is that the naming convention is still not compatible. In fact, I get: Debug: protein_2_temp: .pdb Debug: protein_1: target, protein_2: candidate_1_rank_003, pae_filename: renamed+targetcandidate_1_rank_003_pae.png Debug: Rank: Not Available
Do you have an example of a correct filename that I can transform my files into?
Thanks so much for your time and help
— Reply to this email directly, view it on GitHub https://github.com/flyark/AFM-LIS/issues/6#issuecomment-2270379764, or unsubscribe https://github.com/notifications/unsubscribe-auth/AGHFEW7SSF45RNUUX5PN7TTZQBJEDAVCNFSM6AAAAABMBJGC5OVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDENZQGM3TSNZWGQ. You are receiving this because you commented.
Try these. protein_1_protein_2_unrelaxed_rank_001_alphafold2_multimer_v3_model_3_seed_000.pdb protein1protein_2_scores_rank_001_alphafold2_multimer_v3_model_5_seed_000.json protein_1_protein_3_unrelaxed_rank_001_alphafold2_multimer_v3_model_5_seed_000.pdb protein1protein_3_scores_rank_001_alphafold2_multimer_v3_model_1_seed_000.json protein_1_protein_4_unrelaxed_rank_001_alphafold2_multimer_v3_model_1_seed_000.pdb protein1protein_4_scores_rank_001_alphafold2_multimer_v3_model_4_seed_000.json protein_1_protein_5_unrelaxed_rank_001_alphafold2_multimer_v3_model_4_seed_000.pdb protein1protein_5_scores_rank_001_alphafold2_multimer_v3_model_3_seed_000.json protein_1_protein_6_unrelaxed_rank_001_alphafold2_multimer_v3_model_3_seed_000.pdb protein1protein_6_scores_rank_001_alphafold2_multimer_v3_model_3_seed_000.json protein_1_protein_7_unrelaxed_rank_001_alphafold2_multimer_v3_model_3_seed_000.pdb protein1protein_7_scores_rank_001_alphafold2_multimer_v3_model_3_seed_000.json protein_1_protein_8_unrelaxed_rank_001_alphafold2_multimer_v3_model_3_seed_000.pdb protein1protein_8_scores_rank_001_alphafold2_multimer_v3_model_3_seed_000.json protein_1_protein_9_unrelaxed_rank_001_alphafold2_multimer_v3_model_3_seed_000.pdb protein1protein_9_scores_rank_001_alphafold2_multimer_v3_model_3_seed_000.json protein_1_protein_10_unrelaxed_rank_001_alphafold2_multimer_v3_model_3_seed_000.pdb protein1protein_10_scores_rank_001_alphafold2_multimer_v3_model_1_seed_000.json protein_1_protein_11_unrelaxed_rank_001_alphafold2_multimer_v3_model_1_seed_000.pdb protein1protein_11_scores_rank_001_alphafold2_multimer_v3_model_3_seed_000.json … On Aug 6, 2024, at 12:53 AM, mavericb @.***> wrote: Hey there, thanks so much for your answer!! I renamed the files using the convention suggested. target_candidate_1_rank_001.pdb target_candidate_1_rank003.pdb targetcandidate_1_rank_005.pdb target_candidate_1_rank002.pdb targetcandidate_1_rank_004.pdb In particular, I created a script to rank by average pLDDT gained from files, but nothing is generated, even though I don't get any errors. I think the problem is that the naming convention is still not compatible. In fact, I get: Debug: protein_2_temp: .pdb Debug: protein_1: target, protein_2: candidate_1_rank_003, pae_filename: renamed+targetcandidate_1_rank_003_pae.png Debug: Rank: Not Available Do you have an example of a correct filename that I can transform my files into? Thanks so much for your time and help — Reply to this email directly, view it on GitHub <#6 (comment)>, or unsubscribe https://github.com/notifications/unsubscribe-auth/AGHFEW7SSF45RNUUX5PN7TTZQBJEDAVCNFSM6AAAAABMBJGC5OVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDENZQGM3TSNZWGQ. You are receiving this because you commented.
Thank you for your answer We ended up writing a custom filtering script based on pLDDT, pAE, and RMSD. We'll try again in the future Thanks for your work!
Issue Description
We are encountering significant challenges in managing files generated by ColabFold. The current naming convention does not seem to be compatible with AFM-LIS code.
Current File Naming Convention
Here are examples of the current file names generated by ColabFold:
cite.bibtex config.json log.txt run_1_0__T_0.075__seed_111__num_res_44__num_ligand_res_0__use_ligand_context_True__ligand_cutoff_distance_8.0__batch_size_1__number_of_batches_5__model_path_._model_params_ligandmpnn_v_32_010_25.pt.a3m run_1_0__T_0.075__seed_111__num_res_44__num_ligand_res_0__use_ligand_context_True__ligand_cutoff_distance_8.0__batch_size_1__number_of_batches_5__model_path_._model_params_ligandmpnn_v_32_010_25.pt_coverage.png run_1_0__T_0.075__seed_111__num_res_44__num_ligand_res_0__use_ligand_context_True__ligand_cutoff_distance_8.0__batch_size_1__number_of_batches_5__model_path_._model_params_ligandmpnn_v_32_010_25.pt_scores_alphafold2_multimer_v3_model_1_seed_000.json run_1_0__T_0.075__seed_111__num_res_44__num_ligand_res_0__use_ligand_context_True__ligand_cutoff_distance_8.0__batch_size_1__number_of_batches_5__model_path_._model_params_ligandmpnn_v_32_010_25.pt_scores_alphafold2_multimer_v3_model_2_seed_000.json run_1_0__T_0.075__seed_111__num_res_44__num_ligand_res_0__use_ligand_context_True__ligand_cutoff_distance_8.0__batch_size_1__number_of_batches_5__model_path_._model_params_ligandmpnn_v_32_010_25.pt_scores_alphafold2_multimer_v3_model_3_seed_000.json run_1_0__T_0.075__seed_111__num_res_44__num_ligand_res_0__use_ligand_context_True__ligand_cutoff_distance_8.0__batch_size_1__number_of_batches_5__model_path_._model_params_ligandmpnn_v_32_010_25.pt_scores_alphafold2_multimer_v3_model_4_seed_000.json run_1_0__T_0.075__seed_111__num_res_44__num_ligand_res_0__use_ligand_context_True__ligand_cutoff_distance_8.0__batch_size_1__number_of_batches_5__model_path_._model_params_ligandmpnn_v_32_010_25.pt_scores_alphafold2_multimer_v3_model_5_seed_000.json run_1_0__T_0.075__seed_111__num_res_44__num_ligand_res_0__use_ligand_context_True__ligand_cutoff_distance_8.0__batch_size_1__number_of_batches_5__model_path_._model_params_ligandmpnn_v_32_010_25.pt_unrelaxed_alphafold2_multimer_v3_model_1_seed_000.pdb run_1_0__T_0.075__seed_111__num_res_44__num_ligand_res_0__use_ligand_context_True__ligand_cutoff_distance_8.0__batch_size_1__number_of_batches_5__model_path_._model_params_ligandmpnn_v_32_010_25.pt_unrelaxed_alphafold2_multimer_v3_model_2_seed_000.pdb run_1_0__T_0.075__seed_111__num_res_44__num_ligand_res_0__use_ligand_context_True__ligand_cutoff_distance_8.0__batch_size_1__number_of_batches_5__model_path_._model_params_ligandmpnn_v_32_010_25.pt_unrelaxed_alphafold2_multimer_v3_model_3_seed_000.pdb run_1_0__T_0.075__seed_111__num_res_44__num_ligand_res_0__use_ligand_context_True__ligand_cutoff_distance_8.0__batch_size_1__number_of_batches_5__model_path_._model_params_ligandmpnn_v_32_010_25.pt_unrelaxed_alphafold2_multimer_v3_model_4_seed_000.pdb run_1_0__T_0.075__seed_111__num_res_44__num_ligand_res_0__use_ligand_context_True__ligand_cutoff_distance_8.0__batch_size_1__number_of_batches_5__model_path_._model_params_ligandmpnn_v_32_010_25.pt_unrelaxed_alphafold2_multimer_v3_model_5_seed_000.pdb
AFM-LIS code: ` def calculate_pae(pdb_file_path: str, print_results: bool = True, pae_cutoff: float = 12.0, name_separator: str = "___"): parser = PDB.PDBParser() file_name = pdb_file_path.split("/")[-1] data_folder = pdb_file_path.split("/")[-2]
` https://github.com/flyark/AFM-LIS/blob/main/alphafold_interaction_scores_github_20240421.ipynb