Lot of variation in masif_ppi_search outputs?

av1659 commented 4 years ago

Hi there - I always get very different answers when doing the pdl1 benchmark with the neural network and I am not sure why.

What is the source of variation, shouldn't the neural network weights for pdl1_benchmark_nn.py be preloaded?

Otherwise a completely random model would be initialized with nn_model = ScoreNN(), which doesn't seem right. Thanks!

pablogainza commented 4 years ago

There is a stochastic step in the alignment process - a random sampling consensus (RANSAC) algorithm (take a look at supplementary figure 6).

Basically you use the fingerprints to take correspondences between the patches, but those correspondences are not all correct. To find 'correct' correspondences, the RANSAC algorithm randomly chooses subsets of 3 correspondences and tests alignments. After running for a predetermined number of iterations, it chooses the best alignment.

Indeed, this is a non-differentiable element of the pipeline, dependenet on a set of hand-crafted parameters (e.g. number of iterations, etc). Itshould really be replaced by a non-stochastic, fully differentiable network, IMO.

For the masif-search results in the paper we took the median result after a bunch of runs.

av1659 commented 4 years ago

Ah, thank you for the explanation!

LPDI-EPFL / masif

Lot of variation in masif_ppi_search outputs? #7