LPDI-EPFL / masif

MaSIF- Molecular surface interaction fingerprints. Geometric deep learning to decipher patterns in molecular surfaces.
Apache License 2.0
582 stars 154 forks source link

MaSIF v2 source code #23

Closed agamemnonc closed 3 years ago

agamemnonc commented 3 years ago

Hi there,

Have you, are your planning to, release the source code from the recent biorxiv paper?

Thanks

chaitjo commented 3 years ago

+1 I am also very eager to access the codebase for your new model!

FreyrS commented 3 years ago

Hi!

Thanks for the interest in our new method! We are not quite yet ready to release the source code but we aim to do so as soon as possible. I'll let you know when we publish it online.

chaitjo commented 3 years ago

Awesome, looking forward to it @FreyrS, cool work!

Quick Q. while I have you here: For dMaSIF, when compared to PointNet++ and DGCNN, could you talk about why your new convolution is so much faster as well as taking lesser memory per sample, as shown in Fig. 11, 12? Could you highlight which design choice leads to this, e.g. is PN++ so much slower because it is doing multi-scale operations while dMaSIF and DGCNN are not? (Esp. considering that both these baselines are also using KeOps speedups...)

FreyrS commented 3 years ago

Hey!

I'm going to copy an answer from my co-author Jean Feydy who is one of the main developers for KeOps:

"If I am not mistaken, this is because our implementation keeps the number of channels very low, and does not store any of the intermediate results such as the «activation maps» in K-NN neighbourhoods.

Using KeOps for the «full convolutions» implements «checkpointing» out if the box: you get a super small memory footprint, at the cost of a slightly slow-down in the backward step.

I must stress that even if we use KeOps in the KNN query for the Dgcnn convolutions, we still have to pay a significant time and memory cost to build the point neighbourhoods as (N,K,D) arrays using scattered memory accesses (an operation that really does not stream well on GPUs).

As discussed briefly in the dMasif paper, and in more detail in the KeOps doc and NeurIPS paper, the typical size of problems that we encounter in protein sciences / shape analysis (= batches of clouds of 2k-20k 3D points each) really hits a sweet spot for KeOps.

It's a setting where the data is large enough to make GPUs worthwhile, but also small enough to make brute-force methods more efficient than graph-based implementations.

This is not an accident, of course: the first motivation for the library has always been to accelerate this range of computations, for applications to computational anatomy and biomedical imaging."

agamemnonc commented 3 years ago

@FreyrS thanks for releasing the code!

For everyone else on this thread: https://github.com/FreyrS/dMaSIF

FreyrS commented 3 years ago

@agamemnonc thanks a lot for updating this issue! I had completely forgotten about it, sorry about that!