Weight visualization using networkx and cytoscape

inoue0426 commented 2 years ago

Description

visualize final hidden layers' weights using networkx and cytoscape
- Especially visualized HCC44_LUNG and OACPA4C_OESOPHAGUS, which use the same drug, camptothecin.
There are six final layers, but we focus on the first one to get the weight for all GOs first.
Next, we enumerated all the ways from each gene to the Biological Process (GO:0008150) and added all the GO weights that appeared here to obtain the gene weights. (In this case, since duplicate GOs are not allowed, even a GO that has passed through multiple times is considered to be one GO.)

inoue0426 commented 2 years ago

Using this notebook you can get the weight and graph structure for GO and Genes.

These are graphs for both cell lines.

Screen Shot 2022-07-24 at 12 58 28

These weights are from original data, so I'll visualize our datasets' weight after this.

inoue0426 commented 2 years ago

What I was talking about mainly is the figure. 2. https://www.cell.com/cancer-cell/pdf/S1535-6108(20)30488-8.pdf And for further understanding, we need to show figure 3

inoue0426 commented 2 years ago

This is the same as Fig 3 B, D. Just concatenate all data and then use PCA.

inoue0426 commented 2 years ago

Probably to say this model is explainable, we also need to implement something to show which GO is important for the Cell Line and Drug combination, this is not so difficult just using weight as importance though. So need to make figures like Fig. 3 G.

inoue0426 commented 2 years ago

[WIP] Idea to create a quantitative graph to show the biological plausibility.

Firstly I focused on camptothecin, PubChemID = 24360. For our dataset, there are 9 cell lines related to the drug.

	0	1	2
8484	NCIH522_LUNG	CC[C@@]1(C2=C(COC1=O)C(=O)N3CC4=CC5=CC=CC=C5N=C4C3=C2)O	0.410863
9395	SKOV3_OVARY	CC[C@@]1(C2=C(COC1=O)C(=O)N3CC4=CC5=CC=CC=C5N=C4C3=C2)O	0.374849
13428	HOP62_LUNG	CC[C@@]1(C2=C(COC1=O)C(=O)N3CC4=CC5=CC=CC=C5N=C4C3=C2)O	0.900372
42816	KM12_LARGE_INTESTINE	CC[C@@]1(C2=C(COC1=O)C(=O)N3CC4=CC5=CC=CC=C5N=C4C3=C2)O	-1.19563
48534	786O_KIDNEY	CC[C@@]1(C2=C(COC1=O)C(=O)N3CC4=CC5=CC=CC=C5N=C4C3=C2)O	0.593702
51071	BT549_BREAST	CC[C@@]1(C2=C(COC1=O)C(=O)N3CC4=CC5=CC=CC=C5N=C4C3=C2)O	-0.605414
69634	IGROV1_OVARY	CC[C@@]1(C2=C(COC1=O)C(=O)N3CC4=CC5=CC=CC=C5N=C4C3=C2)O	-1.37477
72575	NCIH226_LUNG	CC[C@@]1(C2=C(COC1=O)C(=O)N3CC4=CC5=CC=CC=C5N=C4C3=C2)O	0.458703
86337	A498_KIDNEY	CC[C@@]1(C2=C(COC1=O)C(=O)N3CC4=CC5=CC=CC=C5N=C4C3=C2)O	0.0871837

Then get the hidden weights for them. This matrix should be 9 samples * 2068 GO.
Then, summation by columns to get a 1D vector as below. (This can be replaced with average)

	0	1	2	3	4	5	6	7	8	9	10	11	12	13	14	15	16	17	18	19
0	9.15048	8.16494	7.89646	7.88956	7.62677	7.32231	7.23469	6.97309	6.81788	6.62716	6.46398	6.35404	6.3292	6.29011	6.24329	6.21284	6.1719	6.07732	6.0678	5.93337

These columns are related to GO, so we can create a graph chart as below. This is nearly equal to Fig 3 G.

From this table, I hope to say something related to the biological phenomena.

In addition, we can split this figure by drug response score, negative and positive. If we set the separation, definitely the distribution of top score GO is different between both data.

inoue0426 commented 2 years ago

From Fig 3 G and RLIPP explanation, this is just using GO term's weight to explain.
For example, the Response to cAMP which is focused on the figure is exactly the same as GO:0051591.

inoue0426 commented 2 years ago

I added the explanation about visualization to README. This finally has PCA and Graph structure visualization.

cannin / graph_neural_network_drug_response

Weight visualization using networkx and cytoscape #17

Description

[WIP] Idea to create a quantitative graph to show the biological plausibility.