cannin / graph_neural_network_drug_response

2 stars 0 forks source link

Weight visualization using networkx and cytoscape #17

Open inoue0426 opened 2 years ago

inoue0426 commented 2 years ago

Description

inoue0426 commented 2 years ago

Using this notebook you can get the weight and graph structure for GO and Genes.

These are graphs for both cell lines.

Screen Shot 2022-07-24 at 12 58 28

These weights are from original data, so I'll visualize our datasets' weight after this.

inoue0426 commented 2 years ago

What I was talking about mainly is the figure. 2. https://www.cell.com/cancer-cell/pdf/S1535-6108(20)30488-8.pdf And for further understanding, we need to show figure 3

inoue0426 commented 2 years ago

This is the same as Fig 3 B, D. image Just concatenate all data and then use PCA.

inoue0426 commented 2 years ago

Probably to say this model is explainable, we also need to implement something to show which GO is important for the Cell Line and Drug combination, this is not so difficult just using weight as importance though. So need to make figures like Fig. 3 G.

inoue0426 commented 2 years ago

[WIP] Idea to create a quantitative graph to show the biological plausibility.

Firstly I focused on camptothecin, PubChemID = 24360. For our dataset, there are 9 cell lines related to the drug.

0 1 2
8484 NCIH522_LUNG CC[C@@]1(C2=C(COC1=O)C(=O)N3CC4=CC5=CC=CC=C5N=C4C3=C2)O 0.410863
9395 SKOV3_OVARY CC[C@@]1(C2=C(COC1=O)C(=O)N3CC4=CC5=CC=CC=C5N=C4C3=C2)O 0.374849
13428 HOP62_LUNG CC[C@@]1(C2=C(COC1=O)C(=O)N3CC4=CC5=CC=CC=C5N=C4C3=C2)O 0.900372
42816 KM12_LARGE_INTESTINE CC[C@@]1(C2=C(COC1=O)C(=O)N3CC4=CC5=CC=CC=C5N=C4C3=C2)O -1.19563
48534 786O_KIDNEY CC[C@@]1(C2=C(COC1=O)C(=O)N3CC4=CC5=CC=CC=C5N=C4C3=C2)O 0.593702
51071 BT549_BREAST CC[C@@]1(C2=C(COC1=O)C(=O)N3CC4=CC5=CC=CC=C5N=C4C3=C2)O -0.605414
69634 IGROV1_OVARY CC[C@@]1(C2=C(COC1=O)C(=O)N3CC4=CC5=CC=CC=C5N=C4C3=C2)O -1.37477
72575 NCIH226_LUNG CC[C@@]1(C2=C(COC1=O)C(=O)N3CC4=CC5=CC=CC=C5N=C4C3=C2)O 0.458703
86337 A498_KIDNEY CC[C@@]1(C2=C(COC1=O)C(=O)N3CC4=CC5=CC=CC=C5N=C4C3=C2)O 0.0871837

Then get the hidden weights for them. This matrix should be 9 samples * 2068 GO.
Then, summation by columns to get a 1D vector as below. (This can be replaced with average)

0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19
0 9.15048 8.16494 7.89646 7.88956 7.62677 7.32231 7.23469 6.97309 6.81788 6.62716 6.46398 6.35404 6.3292 6.29011 6.24329 6.21284 6.1719 6.07732 6.0678 5.93337

These columns are related to GO, so we can create a graph chart as below. This is nearly equal to Fig 3 G. image

From this table, I hope to say something related to the biological phenomena.

In addition, we can split this figure by drug response score, negative and positive. If we set the separation, definitely the distribution of top score GO is different between both data.

inoue0426 commented 2 years ago

From Fig 3 G and RLIPP explanation, this is just using GO term's weight to explain.
For example, the Response to cAMP which is focused on the figure is exactly the same as GO:0051591.

inoue0426 commented 2 years ago

I added the explanation about visualization to README. This finally has PCA and Graph structure visualization.