YuzheWangPKU / DiffPepBuilder

Official repository for Target-Specific De Novo Peptide Binder Design with DiffPepBuilder
MIT License
62 stars 7 forks source link

Training data processing #6

Closed sun-heqi closed 2 months ago

sun-heqi commented 2 months ago

Hi, Great work and thanks for uploading the raw data! I followed the instruction to preprocess the raw data, however I get the following errors. Does this error matter for the training process?

Thank you very much!

Files will be written to data/complex_dataset
Failed to parse BC_6RH7_2.0_fixed_22_33_B12_helix with error Specified ligand chain not found for BC_6RH7_2.0_fixed_22_33_B12_helix.pdb
Failed to parse BC_6FPG_1.8_fixed_29_39_B11_helix with error Specified ligand chain not found for BC_6FPG_1.8_fixed_29_39_B11_helix.pdb
Failed to parse CG_6UT5_2.44_fixed_195_202_G8_loop with error Specified ligand chain not found for CG_6UT5_2.44_fixed_195_202_G8_loop.pdb
Failed to parse CD_2MIN_2.03_fixed_497_509_D13_helix with error Specified ligand chain not found for CD_2MIN_2.03_fixed_497_509_D13_helix.pdb
Failed to parse BC_5ZQ5_2.49_fixed_784_794_C11_helix with error Specified ligand chain not found for BC_5ZQ5_2.49_fixed_784_794_C11_helix.pdb
Failed to parse BC_5AEW_1.88_fixed_97_107_B11_loop with error Specified ligand chain not found for BC_5AEW_1.88_fixed_97_107_B11_loop.pdb
Failed to parse BC_3BOW_2.4_fixed_670_677_C8_loop with error Specified ligand chain not found for BC_3BOW_2.4_fixed_670_677_C8_loop.pdb
Failed to parse BC_2JEB_2.4_fixed_239_246_C8_loop with error Specified ligand chain not found for BC_2JEB_2.4_fixed_239_246_C8_loop.pdb
Failed to parse BC_4HB2_1.8_fixed_94_105_B12_helix with error Specified ligand chain not found for BC_4HB2_1.8_fixed_94_105_B12_helix.pdb
Failed to parse CD_8EN3_2.1_fixed_108_115_D8_loop with error Specified ligand chain not found for CD_8EN3_2.1_fixed_108_115_D8_loop.pdb
Failed to parse BC_2OZL_1.9_fixed_299_309_B11_loop with error Specified ligand chain not found for BC_2OZL_1.9_fixed_299_309_B11_loop.pdb
Failed to parse FG_6UT5_2.44_fixed_234_241_F8_loop with error Specified ligand chain not found for FG_6UT5_2.44_fixed_234_241_F8_loop.pdb
Failed to parse BD_6F6P_2.45_fixed_57_68_D12_helix with error Specified ligand chain not found for BD_6F6P_2.45_fixed_57_68_D12_helix.pdb
Failed to parse BC_6X6U_1.94_fixed_296_303_C8_loop with error Specified ligand chain not found for BC_6X6U_1.94_fixed_296_303_C8_loop.pdb
Failed to parse BC_2A69_2.5_fixed_981_988_B8_loop with error Specified ligand chain not found for BC_2A69_2.5_fixed_981_988_B8_loop.pdb
Failed to parse BC_3D9A_1.2_fixed_122_129_B8_loop with error Specified ligand chain not found for BC_3D9A_1.2_fixed_122_129_B8_loop.pdb
Failed to parse CD_6GZC_2.0_fixed_99_110_C12_helix with error Specified ligand chain not found for CD_6GZC_2.0_fixed_99_110_C12_helix.pdb
Failed to parse CD_4ZI3_2.0_fixed_1_9_D9_loop with error Specified ligand chain not found for CD_4ZI3_2.0_fixed_1_9_D9_loop.pdb
Failed to parse CD_7B2H_2.12_fixed_362_369_C8_loop with error Specified ligand chain not found for CD_7B2H_2.12_fixed_362_369_C8_loop.pdb
Failed to parse BC_2AGZ_1.6_fixed_72_79_B8_loop with error Specified ligand chain not found for BC_2AGZ_1.6_fixed_72_79_B8_loop.pdb
Failed to parse BD_2ZFO_1.95_fixed_68_80_B13_helix with error Specified ligand chain not found for BD_2ZFO_1.95_fixed_68_80_B13_helix.pdb
Failed to parse CE_3M2U_1.4_fixed_160_167_C8_loop with error Specified ligand chain not found for CE_3M2U_1.4_fixed_160_167_C8_loop.pdb
Failed to parse CE_5CGH_2.5_fixed_132_142_C11_helix with error Specified ligand chain not found for CE_5CGH_2.5_fixed_132_142_C11_helix.pdb
Failed to parse BC_1FXK_2.3_fixed_76_83_C8_loop with error Specified ligand chain not found for BC_1FXK_2.3_fixed_76_83_C8_loop.pdb
Failed to parse CE_2XSH_2.29_fixed_78_88_C11_helix with error Specified ligand chain not found for CE_2XSH_2.29_fixed_78_88_C11_helix.pdb
Failed to parse BC_1UC5_2.3_fixed_49_72_B24_helix with error Specified ligand chain not found for BC_1UC5_2.3_fixed_49_72_B24_helix.pdb
Failed to parse CE_5EXD_2.5_fixed_78_94_C17_helix with error Specified ligand chain not found for CE_5EXD_2.5_fixed_78_94_C17_helix.pdb
Failed to parse BC_7XRL_1.75_fixed_240_247_C8_loop with error Specified ligand chain not found for BC_7XRL_1.75_fixed_240_247_C8_loop.pdb
......
YuzheWangPKU commented 2 months ago

Apologies for the inconvenience. We have identified a minor bug in process_dataset.py and have addressed it in the latest commit. Please try the updated version, and feel free to reach out if the issue persists.

sun-heqi commented 2 months ago

Thanks Yuzhe. I just downloaded and ran the latest process_dataset.py, but still got the same error messages. Do I need to update anything else?

YuzheWangPKU commented 2 months ago

Hi Heqi, I just ran the process and have likely identified where the problem lies. Please remove the files with the suffix _processed.pdb in /data/PepPC-F_raw_data before running preprocessing a second time, as these files, generated during the previous preprocessing process, will not be recognized by our script. Alternatively, you can delete /data/PepPC-F_raw_data and run the whole pipeline again to see if it works.

sun-heqi commented 2 months ago

The problem has been solved, thanks a lot!