gnina / libmolgrid

Comprehensive library for fast, GPU accelerated molecular gridding for deep learning workflows
https://gnina.github.io/libmolgrid/
Apache License 2.0
141 stars 47 forks source link

File Format Question #96

Closed drorhunvural closed 2 years ago

drorhunvural commented 2 years ago

My question is simple but I couldn't get it.

There is a file calledsmall.types where you keep gninatypes and coordinate information.

What is the meaning of the first value in the first column in the small.types file ( only 0 and 1)?

I'm asking this question because

I have a types file in the following format. (4 columns - 1-label, 2- x, 3- y, 4- z, 5-gninatypes name)

3 -6.294230132301325 1.8449302493024933 16.732595805958056 1cny_protein_nowat.gninatypes 3 -4.56334078598594 0.8068705773884981 11.723845568744958 1cny_protein_nowat.gninatypes 3 17.665444159862048 24.895828067150177 31.40846698280671 5mxx_protein_nowat.gninatypes 3 10.071925527781376 23.38455951301644 34.62545564046107 5mxx_protein_nowat.gninatypes

I can't populate eptrain when using my file format like above.

e = molgrid.ExampleProvider(data_root=allgninatypes,balanced=True,shuffle=True,stratify_receptor=True)
e.populate(fname)

I don't get an error in the above code, but eptrain shows 0 in terms of size. I can't populate.

dkoes commented 2 years ago

The first column (by default) is a binary classification label (0 or 1) and balancing will make sure each batch has equal numbers of each class (it looks at the "labelpos" position for the lable, which is zero by default). Most likely if you disable balancing you will get the expected result.

drorhunvural commented 2 years ago

Great answer!