HaotianZhangAI4Science / Delete

Delete: Directly optimizing lead in protein pockets, including linker design, fragment elaboration, scaffold hopping and side-chain decoration
MIT License
22 stars 2 forks source link

How to preprocess a custom training dataset? #11

Open zh2417 opened 2 months ago

zh2417 commented 2 months ago

Dear Hello, I want to use my own dataset to train a model, but I'm not sure how to preprocess the input data. The function I have is: python def surfdata_prepare(ply_file, frag_kept_sdf): ''' use the sdf_file as the center ''' protein_dict = read_ply(ply_file) keep_frag_mol = read_sdf(frag_kept_sdf)[0] ligand_dict = parse_rdmol(keep_frag_mol) data = ProteinLigandData.from_protein_ligand_dicts( protein_dict = torchify_dict(protein_dict), ligand_dict = torchify_dict(ligand_dict) ) return data This is the function from Delete.py. Can this function be used to construct a training dataset, or is surfdata_prepare insufficient for preprocessing the data for training?

HaotianZhangAI4Science commented 1 month ago

Hi,

Basically, you need a pair list containing (ligand.sdf, protein.ply). I will prepare a demo dataset construction for you.

Best, Odin