LPDI-EPFL / masif

MaSIF- Molecular surface interaction fingerprints. Geometric deep learning to decipher patterns in molecular surfaces.
Apache License 2.0
572 stars 151 forks source link

Meaning of mask in precomputed numpy arrays #19

Closed stebliankin closed 3 years ago

stebliankin commented 3 years ago

Hi,

I am exploring the numpy arrays produced by the precomputation script: $masif_source/data_preparation/04-masif_precompute.py masif_ppi_search

Could you please explain the meaning of p1_mask.npy and p2_mask.npy? I understand that the mask is for rho and theta, but don't understand the meaning. In which cases the value is zero? Why do we skip some of the neighbors in the patch?

Thank you so much for all your effort!

pablogainza commented 3 years ago

Hi ! Sorry for the late reply!

So the radius for the patches is fixed at a maximum of R angstroms (we use either R=9 or R=12). The reason we set a maximum radius is for speed efficiency when computing the Dijkstra nearest neighbors.

It just so happens that in most cases this is slightly above either 100 vertices (for R=9) or 200 vertices (for R=12). This makes it convenient to fix the tensors of our networks at a fixed number of vertices.

However, there are some cases (deep pockets or protruding knobs) where the patch has a lower number of vertices. (i.e. a patch with R=12 that has only 150 vertices). In these cases the mask gets set to 0 for vertices 150 to 199 so they are ignored.