dhimmel / rephetio

Miscellaneous Content for Project Rephetio to repurpose drugs
https://think-lab.github.io/p/rephetio/
6 stars 2 forks source link

Neo4j Online Meetup 2017-11-30 Materials #3

Open dhimmel opened 6 years ago

dhimmel commented 6 years ago

How Project Rephetio used Neo4j to predict drug repurposing

Thursday, November 30, 2017 on YouTube. Below is the event description from Meetup:

This meetup will explore Hetionet (https://neo4j.het.io), a public Neo4j database that encodes biomedical knowledge. Hetionet v1.0 contains 47,031 nodes of 11 types and 2,250,197 relationships of 24 types.

Project Rephetio applied Hetionet to predict new uses for existing compounds, an act called drug repurposing. We'll discuss the Cypher implementation of the algorithms used for relationship prediction on hetnets (networks with multiple node and relationship types).

We'll be taking questions live during the session but if you have any before hand be sure to post them in the #neo4j-online-meetup channel of the Neo4j users slack.

We'll be hosting this session on YouTube live.

Time

09:00 PST (UTC - 8 hours) 12:00 EST (UTC - 5 hours) 17:00 UTC 18:00 CEST (UTC + 1 hour)

About The Speaker

Daniel Himmelstein, a data scientist at the University of Pennsylvania, will lead the meetup.

Previously, Daniel has discussed Project Rephetio at GraphConnect 2016 and on the Graphistania podcast.

In addition, an introductory GraphGist on the project won the Open/Government Data category of the 2016 GraphGist Challenge.

dhimmel commented 6 years ago

Meetup Outline

This meetup will go over how we used Neo4j in our study titled Project Rephetio:

rephetio-head

Project Rephetio is also available on Thinklab and as a Manubot manuscript. This project had two parts:

  1. Creating Hetionet, a hetnet of biomedical knowledge
  2. Predicting new uses for existing compounds (drugs)

Hetionet

Project Rephetio

Advanced Cypher

We'll go over computing degree-weighted path counts (DWPCs) in Cypher (discussion) though a series of steps.

Trails

Path count from Bupropion to nicotine dependence for the Compound–binds–Gene–participates–Pathway–participates–Disease metapath:

MATCH path = (n0:Compound)-[:BINDS_CbG]-(n1)-[:PARTICIPATES_GpPW]-
  (n2)-[:PARTICIPATES_GpPW]-(n3)-[:ASSOCIATES_DaG]-(n4:Disease)
WHERE n0.name = 'Bupropion'
  AND n4.name = 'nicotine dependence'
RETURN path

Note how relationship types are uniquely named for optimized querying, e.g. GpPW.

Modified RETURN statements to provide a table:

RETURN extract(node IN nodes(path) | node.name)

Or just return the path/trail count:

RETURN count(path) AS PC

Paths

Add the following condition to the WHERE statement to prevent paths with duplicate nodes (discussion):

  AND n1 <> n3

Optimizing the join index (discussion, see https://github.com/neo4j/neo4j/issues/6030 for a radical proposal)

USING JOIN ON n2

Degree-weighted paths

Extract degrees along each path to compute a path_weight (also known as a "path-degree product")

WITH
[
  size((n0)-[:BINDS_CbG]-()),
  size(()-[:BINDS_CbG]-(n1)),
  size((n1)-[:PARTICIPATES_GpPW]-()),
  size(()-[:PARTICIPATES_GpPW]-(n2)),
  size((n2)-[:PARTICIPATES_GpPW]-()),
  size(()-[:PARTICIPATES_GpPW]-(n3)),
  size((n3)-[:ASSOCIATES_DaG]-()),
  size(()-[:ASSOCIATES_DaG]-(n4))
] AS degrees, path
RETURN
  path,
  reduce(pdp = 1.0, d in degrees| pdp * d ^ -0.4) AS path_weight
ORDER BY path_weight DESC
LIMIT 10

Sum weights for all paths to compute the DWPC:

RETURN
  count(path) AS PC,
  sum(reduce(pdp = 1.0, d in degrees| pdp * d ^ -0.4)) AS DWPC

Putting it altogether:

MATCH path = (n0:Compound)-[:BINDS_CbG]-(n1)-[:PARTICIPATES_GpPW]-
  (n2)-[:PARTICIPATES_GpPW]-(n3)-[:ASSOCIATES_DaG]-(n4:Disease)
USING JOIN ON n2
WHERE n0.name = 'Bupropion'
  AND n4.name = 'nicotine dependence'
  AND n1 <> n3
WITH
[
  size((n0)-[:BINDS_CbG]-()),
  size(()-[:BINDS_CbG]-(n1)),
  size((n1)-[:PARTICIPATES_GpPW]-()),
  size(()-[:PARTICIPATES_GpPW]-(n2)),
  size((n2)-[:PARTICIPATES_GpPW]-()),
  size(()-[:PARTICIPATES_GpPW]-(n3)),
  size((n3)-[:ASSOCIATES_DaG]-()),
  size(()-[:ASSOCIATES_DaG]-(n4))
] AS degrees, path
RETURN
  count(path) AS PC,
  sum(reduce(pdp = 1.0, d in degrees| pdp * d ^ -0.4)) AS DWPC
hooligian commented 6 years ago

Trying to access https://neo4j.het.io/browser/, but I'm getting a "WebSocket connection failure. Due to security constraints in your web browser, the reason for the failure is not available to this Neo4j Driver." error. A little digging on the net indicated that the neo4j.conf file would need to be updated to allow remote browser connections.

dhimmel commented 6 years ago

@hooligian odd! I'm just as remote as you I believe. Can you try again? Or perhaps in a different browser? https://neo4j.het.io