FreyrS / dMaSIF

Other
191 stars 44 forks source link

Confused about some properties of the data #7

Closed YifanDengWHU closed 3 years ago

YifanDengWHU commented 3 years ago

Hi, thanks for your great work! However, I am quite confused about some of the properties. For example, we can extract _atomcoords and _atomtypes from the PDB files. Then what is the difference between xyz in PLY files and _atomcoords in PDB files? As for the face/triangle, I thought every triple forms a triangle surface, right? I am also curious about the normals from PLY files. In the paper, it seems that it should be calculated by sampling algorithm, so how do we get them straight from PLY files?

Sorry for not being familiar with the protein surface representation and I hope you can answer my questions. Thanks!

bbjy commented 3 years ago

I have similar questions with @YifanDengWHU. As the paper introduced, dMaSIF only requires "the sole input raw 3D coordinates and chemical types of their atoms, without using any precomputed mesh structure or features". However, when "load_protein_pair", it needs many precomputed features in the "surface_data/raw/01-benchmark_surfaces/" data folder. Why? Are these data for other baseline methods?

Maybe I don't understand it right. Looking forward to your reply. Thanks!

FreyrS commented 3 years ago

Hi both!

The .ply files are from our previous work https://doi.org/10.1038/s41592-019-0666-6 , https://github.com/LPDI-EPFL/masif. They are only used for benchmarking against that previous method. The atom_coords give you the coordinates of the atoms while the xyz and the triangles in the ply files give you the surfaces that were precomputed in the previous method.

BJWiley233 commented 4 months ago

I am also confused. Here it indicates you also pull the chemical features (charge, hbond, hphob) from Gainza's ply files for both training and test datasets.

https://github.com/FreyrS/dMaSIF/blob/0dcc26c3c218a39d5fe26beb2e788b95fb028896/data_preprocessing/convert_ply2npy.py#L28-L48. https://github.com/FreyrS/dMaSIF/blob/0dcc26c3c218a39d5fe26beb2e788b95fb028896/data.py#L151

I am wondering where the y-axis is predicted in the code from the Figure from the paper? I see the dMaSIF class has an AtomNet_MP for the chemical features but I don't understand how that class's underlying embedding classes actually does a prediction for charge (aka potential, aka electrostatics)?

image

BJWiley233 commented 3 weeks ago

FYI adding a note about a bug with the triangles while using the meshes. I am assuming you didn't run analysis for the paper using meshes?? You never add the attribute triangles/faces with dimension (3,T) in the method extract_single https://github.com/FreyrS/dMaSIF/blob/0dcc26c3c218a39d5fe26beb2e788b95fb028896/data_iteration.py#L218

So when you do forward embedding with args.use_mesh=True then the code errors in the calls to curvature and load_mesh

The only problem is you can't use any x,y,z coodinates length dimension for the batch variable because faces of triangular meshes are longer than vertices so you have to do another batch variable for the faces maybe. I will try it out.