gnina / libmolgrid

Comprehensive library for fast, GPU accelerated molecular gridding for deep learning workflows
https://gnina.github.io/libmolgrid/
Apache License 2.0
144 stars 48 forks source link

Taking care of each region when creating a gninatype #109

Open drorhunvural opened 1 year ago

drorhunvural commented 1 year ago

Hi,

I'm converting a pdb file to a gninatype file. I have a process similar to the gninatype function in the link

def gninatype(file):
    # creates gninatype file for model input
    f=open(file.replace('.pdb','.types'),'w')
    f.write(file)
    f.close()
    atom_map=molgrid.FileMappedGninaTyper(f'{pathlib.Path(os.path.realpath(__file__)).resolve().parent}/gninamap')
    dataloader=molgrid.ExampleProvider(atom_map,shuffle=False,default_batch_size=1)
    train_types=file.replace('.pdb','.types')
    dataloader.populate(train_types)
    example=dataloader.next()
    coords=example.coord_sets[0].coords.tonumpy()
    types=example.coord_sets[0].type_index.tonumpy()
    types=np.int_(types) 
    fout=open(file.replace('.pdb','.gninatypes'),'wb')
    for i in range(coords.shape[0]):
        fout.write(struct.pack('fffi',coords[i][0],coords[i][1],coords[i][2],types[i]))
    fout.close()
    os.remove(train_types)
    return file.replace('.pdb','.gninatypes')

Are the features in gninamap (28 different features) applied for each x, y, z coordinates row (for each pocket)?

To ask my question more clearly, For example I have 1a4h.pdb file and I am generating 1a4h.gninatypes with above function called gninatype.

I have data file like below

18.5426 -3.5417 -4.3501 1a4h.gninatypes
16.4473 -2.0545 -9.2645 1a4h.gninatypes
11.5426 -5.5317 -7.3222 1a4h.gninatypes
17.5426 -6.5419 -1.6552 1a4h.gninatypes
...

The characteristics of each region are important to me. Are individual features of all individual regions (each row in the dataset) retained with a single gninatypes? Or do I need to set up a structure like the one below?

18.5426 -3.5417 -4.3501 1a4h_pocket1.gninatypes
16.4473 -2.0545 -9.2645 1a4h_pocket2.gninatypes
11.5426 -5.5317 -7.3222 1a4h_pocket3.gninatypes
17.5426 -6.5419 -1.6552 1a4h_pocket4.gninatypes

If you advise me to set up a second dataset structure, how do I do it?

dkoes commented 1 year ago

It's up to you want data you put in the gninatype file. Typically we store the entire structure. If ExampleProvider is being populated with a list of PDBs, it will provide all the coordinates that are in the PDB (after all, at no point hav eyou defined the binding site for it to prune around).