Open inoue0426 opened 2 years ago
Using this notebook you can get the weight and graph structure for GO and Genes.
These are graphs for both cell lines.
These weights are from original data, so I'll visualize our datasets' weight after this.
What I was talking about mainly is the figure. 2. https://www.cell.com/cancer-cell/pdf/S1535-6108(20)30488-8.pdf And for further understanding, we need to show figure 3
This is the same as Fig 3 B, D. Just concatenate all data and then use PCA.
Probably to say this model is explainable, we also need to implement something to show which GO is important for the Cell Line and Drug combination, this is not so difficult just using weight as importance though. So need to make figures like Fig. 3 G.
Firstly I focused on camptothecin, PubChemID = 24360. For our dataset, there are 9 cell lines related to the drug.
0 | 1 | 2 | |
---|---|---|---|
8484 | NCIH522_LUNG | CC[C@@]1(C2=C(COC1=O)C(=O)N3CC4=CC5=CC=CC=C5N=C4C3=C2)O | 0.410863 |
9395 | SKOV3_OVARY | CC[C@@]1(C2=C(COC1=O)C(=O)N3CC4=CC5=CC=CC=C5N=C4C3=C2)O | 0.374849 |
13428 | HOP62_LUNG | CC[C@@]1(C2=C(COC1=O)C(=O)N3CC4=CC5=CC=CC=C5N=C4C3=C2)O | 0.900372 |
42816 | KM12_LARGE_INTESTINE | CC[C@@]1(C2=C(COC1=O)C(=O)N3CC4=CC5=CC=CC=C5N=C4C3=C2)O | -1.19563 |
48534 | 786O_KIDNEY | CC[C@@]1(C2=C(COC1=O)C(=O)N3CC4=CC5=CC=CC=C5N=C4C3=C2)O | 0.593702 |
51071 | BT549_BREAST | CC[C@@]1(C2=C(COC1=O)C(=O)N3CC4=CC5=CC=CC=C5N=C4C3=C2)O | -0.605414 |
69634 | IGROV1_OVARY | CC[C@@]1(C2=C(COC1=O)C(=O)N3CC4=CC5=CC=CC=C5N=C4C3=C2)O | -1.37477 |
72575 | NCIH226_LUNG | CC[C@@]1(C2=C(COC1=O)C(=O)N3CC4=CC5=CC=CC=C5N=C4C3=C2)O | 0.458703 |
86337 | A498_KIDNEY | CC[C@@]1(C2=C(COC1=O)C(=O)N3CC4=CC5=CC=CC=C5N=C4C3=C2)O | 0.0871837 |
Then get the hidden weights for them.
This matrix should be 9 samples * 2068 GO.
Then, summation by columns to get a 1D vector as below. (This can be replaced with average)
0 | 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9 | 10 | 11 | 12 | 13 | 14 | 15 | 16 | 17 | 18 | 19 | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
0 | 9.15048 | 8.16494 | 7.89646 | 7.88956 | 7.62677 | 7.32231 | 7.23469 | 6.97309 | 6.81788 | 6.62716 | 6.46398 | 6.35404 | 6.3292 | 6.29011 | 6.24329 | 6.21284 | 6.1719 | 6.07732 | 6.0678 | 5.93337 |
These columns are related to GO, so we can create a graph chart as below. This is nearly equal to Fig 3 G.
From this table, I hope to say something related to the biological phenomena.
In addition, we can split this figure by drug response score, negative and positive. If we set the separation, definitely the distribution of top score GO is different between both data.
From Fig 3 G and RLIPP explanation, this is just using GO term's weight to explain.
For example, the Response to cAMP which is focused on the figure is exactly the same as GO:0051591.
I added the explanation about visualization to README. This finally has PCA and Graph structure visualization.
Description