Open damiansm opened 5 years ago
The RW takes all paths into account, and so removing links will change everything in ways that are hard to predict. Possibly a better strategy would be to restrict the display of hits to those that have high quality evidence (leave the random walk as is, but adjust the score that is used by Exomiser by some fudge factor that drops off quickly for STRING score below 0.7 for instance).
With our current implementation the scores for PPI hits to direct StringDB interactors don't end up much above those to quite distant interactors in the network. This is not clear without manual investigation on the StringDB site and from a practical perspective of selecting interesting, novel disease gene candidates it is difficult to persuade anyone unless it is a direct interaction with experimental evidence.
One practical solution would be to only show direct, high quality (String > 0.7 and/or experimental evidence) PPI hits for genes that are not associated with disease and where the interactor does have human phenotype evidence.
We could create a subset of our existing rw matrix and leave the scoring as is or just have a simple lookup table for each gene and down-weight the phenotype score of the interactor by a constant value e.g. 0.6 or possibly adjusted for no. of direct interactors.