ArnovanHilten / GenNet

Framework for Interpretable Neural Networks
Apache License 2.0
91 stars 14 forks source link

how to interpret results #93

Closed lesyngenta closed 1 year ago

lesyngenta commented 1 year ago

Hi Arno,

After I got the results 'connection_weights.csv', I don't know how to interpret that. Shall I look at the 'raw importance' column for the genes most correlated with a phenotype? I like the example figure "example_manhattan.png", but the GenNet.py plot only plots manhattan for SNPs instead of genes. One more question is about how to use ".npz" file in the training. Is there any example for the format of npz file?

Thanks, Le

ArnovanHilten commented 1 year ago

Hi @lesyngenta,

It is quite hard to make plots that work/look good for all the networks. If you post your connection_weights.csv I can provide you some code to plot the results. Did you check the other plotting options? You can use sunburst or layer_weight with the corresponding layer that you want to plot. These should also provide you some insight.

About the NPZ:

The GenNet.py topology create_gene_network should provide also the equivalent .npz file for the topology.csv

The npz file is just a handy alternative to the topology.csv

In short:

Dense layer with custom connections. The custom connections are defined by the mask input, a sparse (COO) connectivity matrix.

# The matrix has the shape of (N_nodes_layer_1, N_nodes_layer_2).
# It is a sparse matrix with zeros for no connections and ones if there is a connections. For example.

#             output
#           1 2 3 4 5
# input 1 | 1 0 0 0 0 |
# input 2 | 1 1 0 0 0 |
# input 3 | 0 1 0 0 0 |
# input 4 | 0 1 0 0 0 |
# input 5 | 0 0 1 0 0 |
# input 6 | 0 0 0 1 0 |
# input 7 | 0 0 0 1 0 |

# This connects the first two inputs (1,2) to the first neuron in the second layer.
# Connects input 2,3 and 4 to output neuron 2.
# Connects input 5 to output neuron 3
# Connects input 6 and 7 o the 4th neuron in the subsequent layer
# Connects nothing to the 5th neuron

If you have such a matrix in a numpy array you can simply use:

import numpy as np
a=np.eye(3). # your matrix
import scipy.sparse
coo_matrix = scipy.sparse.coo_matrix(a)
scipy.sparse.save_npz(savepath +'/coo_matrix', coo_matrix)

or if the matrix is very large look at the examples here.

Here you can find some more information about the npz, file

lesyngenta commented 1 year ago

Arno,

My training set didn’t get connection_weights.csv. All the rest output are OK but for connection file, it corrupts with an error “MemoryError: Unable to allocate 811. GiB for an array with shape (108894146608,) and data type int64”. I don’t know why for there is only four layers. The largest layer (layer 0) contains about 40000 nodes and the rest layers contains much less nodes.

Thanks, Le

From: Arno van Hilten @.> Sent: 2023年10月13日 17:29 To: ArnovanHilten/GenNet @.> Cc: LV Le CNBC @.>; Author @.> Subject: Re: [ArnovanHilten/GenNet] how to interpret results (Issue #93)

CAUTION: This email originated from outside of the organization. Do not click links or open attachments unless you recognize the sender and know the content is safe.


Hey Le,

It is quite hard to make plots that work/look good for all the networks. If you post your connection_weights.csv I can provide you some code to plot the results. Did you check the other plotting options? You can use sunburst or layer_weight with the corresponding layer that you want to plot. These should also provide you some insight.

About the NPZ:

(https://github.com/ArnovanHilten/GenNet/blob/master/jupyter_notebooks/2_Define_connection_masks_simple_memory_efficient.ipynbhttps://github.com/ArnovanHilten/GenNet/blob/master/jupyter_notebooks/2_Define_connection_masks_simple_memory_efficient.ipynb). The GenNet.pyhttp://GenNet.py topology create_gene_network should provide also the equivalent .npz file for the topology.csv

The npz file is just a handy alternative to the topology.csv

In short:

Dense layer with custom connections. The custom connections are defined by the mask input, a sparse (COO) connectivity matrix.

The matrix has the shape of (N_nodes_layer_1, N_nodes_layer_2).

It is a sparse matrix with zeros for no connections and ones if there is a connections. For example.

output

1 2 3 4 5

input 1 | 1 0 0 0 0 |

input 2 | 1 1 0 0 0 |

input 3 | 0 1 0 0 0 |

input 4 | 0 1 0 0 0 |

input 5 | 0 0 1 0 0 |

input 6 | 0 0 0 1 0 |

input 7 | 0 0 0 1 0 |

This connects the first two inputs (1,2) to the first neuron in the second layer.

Connects input 2,3 and 4 to output neuron 2.

Connects input 5 to output neuron 3

Connects input 6 and 7 o the 4th neuron in the subsequent layer

Connects nothing to the 5th neuron

If you have such a matrix in a numpy array you can simply use:

import numpy as np

a=np.eye(3)

import scipy.sparse

scipy.sparse.coo_matrix(a)

or if the matrix is very large look at the examples here. https://docs.scipy.org/doc/scipy/reference/generated/scipy.sparse.coo_matrix.html

[Here you can find some more information about the npz, file]

— Reply to this email directly, view it on GitHubhttps://github.com/ArnovanHilten/GenNet/issues/93#issuecomment-1761206791, or unsubscribehttps://github.com/notifications/unsubscribe-auth/BCZZJTKQLY3JJADPGEVQCCLX7ECT5ANCNFSM6AAAAAA5WRK27I. You are receiving this because you authored the thread.Message ID: @.**@.>>

This message may contain confidential information. If you are not the designated recipient, please notify the sender immediately, and delete the original and any copies. Any use of the message by you is prohibited.

ArnovanHilten commented 1 year ago

Hi Le,

For some interpretation steps we calculate for every possible path (between input and output) the importance. This can be very large, especially if you have some densely connected layers. There are some ways to prune this, for example during each step between the layers we could remove the paths with (near) zero importance.

Can you post your error message and maybe a summary of your network? This would help to find a solution for you.

Best,

Arno

lesyngenta commented 1 year ago

Hi Arno,

Here is the network structure attached and the error message.

Thanks, Le

@.***

From: Arno van Hilten @.> Sent: 2023年10月16日 17:08 To: ArnovanHilten/GenNet @.> Cc: LV Le CNBC @.>; Mention @.> Subject: Re: [ArnovanHilten/GenNet] how to interpret results (Issue #93)

CAUTION: This email originated from outside of the organization. Do not click links or open attachments unless you recognize the sender and know the content is safe.


Hi Le,

For some interpretation steps we calculate for every possible path (between input and output) the importance. This can be very large, especially if you have some densely connected layers. There are some ways to prune this, for example during each step between the layers we could remove the paths with (near) zero importance.

Can you post your error message and maybe a summary of your network? This would help to find a solution for you.

Best,

Arno

— Reply to this email directly, view it on GitHubhttps://github.com/ArnovanHilten/GenNet/issues/93#issuecomment-1764046004, or unsubscribehttps://github.com/notifications/unsubscribe-auth/BCZZJTJJ4CATCL7QYA7JUWTX7T2PZANCNFSM6AAAAAA5WRK27I. You are receiving this because you were mentioned.Message ID: @.**@.>>

This message may contain confidential information. If you are not the designated recipient, please notify the sender immediately, and delete the original and any copies. Any use of the message by you is prohibited.

Model: "model"


Layer (type) Output Shape Param # Connected to

input_layer (InputLayer) [(None, 37070)] 0


reshape (Reshape) (None, 37070, 1) 0 input_layer[0][0]


LocallyDirected_0 (LocallyDirec (None, 1949, 1) 99929 reshape[0][0]


activation (Activation) (None, 1949, 1) 0 LocallyDirected_0[0][0]


batch_normalization (BatchNorma (None, 1949, 1) 2 activation[0][0]


LocallyDirected_1 (LocallyDirec (None, 91, 1) 98071 batch_normalization[0][0]


activation_1 (Activation) (None, 91, 1) 0 LocallyDirected_1[0][0]


batch_normalization_1 (BatchNor (None, 91, 1) 2 activation_1[0][0]


LocallyDirected_2 (LocallyDirec (None, 2, 1) 97982 batch_normalization_1[0][0]


activation_2 (Activation) (None, 2, 1) 0 LocallyDirected_2[0][0]


batch_normalization_2 (BatchNor (None, 2, 1) 2 activation_2[0][0]


flatten (Flatten) (None, 2) 0 batch_normalization_2[0][0]


output_layer (Dense) (None, 1) 3 flatten[0][0]


dropout (Dropout) (None, 1) 0 output_layer[0][0]


inputs_cov (InputLayer) [(None, 0)] 0


activation_3 (Activation) (None, 1) 0 dropout[0][0]

Total params: 295,991 Trainable params: 295,985 Non-trainable params: 6


ArnovanHilten commented 1 year ago

Ah the network is relativly small indeed. It should be no problem to ge the connection_weight.csv. I will investigate if there is a bug that causes the memory to be so high!

lesyngenta commented 1 year ago

Hi Arno,

Any idea? I found no matter how large the network is, if it only contains three layers (layer0-layer2), its ok. But if I add one more layer (layer3), then the connection_weight.csv cannot be created.

Thanks, Le

From: Arno van Hilten @.> Sent: 2023年10月17日 22:22 To: ArnovanHilten/GenNet @.> Cc: LV Le CNBC @.>; Mention @.> Subject: Re: [ArnovanHilten/GenNet] how to interpret results (Issue #93)

CAUTION: This email originated from outside of the organization. Do not click links or open attachments unless you recognize the sender and know the content is safe.


Ah the network is relativly small indeed. It should be no problem to ge the connection_weight.csv. I will investigate if there is a bug that causes the memory to be so high!

— Reply to this email directly, view it on GitHubhttps://github.com/ArnovanHilten/GenNet/issues/93#issuecomment-1766524073, or unsubscribehttps://github.com/notifications/unsubscribe-auth/BCZZJTKGKQSBFOB73XUZZYTX72IB7AVCNFSM6AAAAAA5WRK27KVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMYTONRWGUZDIMBXGM. You are receiving this because you were mentioned.Message ID: @.**@.>>

This message may contain confidential information. If you are not the designated recipient, please notify the sender immediately, and delete the original and any copies. Any use of the message by you is prohibited.

ArnovanHilten commented 1 year ago

Hi Le,

I am working on a solution of your problem on this branch: https://github.com/ArnovanHilten/GenNet/tree/epistasis-interpretation .if you switch to this branch you can test it with GenNet.py interpret get_weight_scores. I need some more time to iron out all the bugs, so you can also wait (maybe a week) and then it will work for all networks.

Best,

Arno

lesyngenta commented 1 year ago

Thank you so much!

From: Arno van Hilten @.> Sent: 2023年10月24日 8:03 To: ArnovanHilten/GenNet @.> Cc: LV Le CNBC @.>; Mention @.> Subject: Re: [ArnovanHilten/GenNet] how to interpret results (Issue #93)

CAUTION: This email originated from outside of the organization. Do not click links or open attachments unless you recognize the sender and know the content is safe.


Hi Le,

I am working on a solution of your problem on this branch: https://github.com/ArnovanHilten/GenNet/tree/epistasis-interpretationhttps://github.com/ArnovanHilten/GenNet/tree/epistasis-interpretation .if you switch to this branch you can test it with GenNet.pyhttp://GenNet.py interpret get_weight_scores. I need some more time to iron out all the bugs, so you can also wait (maybe a week) and then it will work for all networks.

Best,

Arno

— Reply to this email directly, view it on GitHubhttps://github.com/ArnovanHilten/GenNet/issues/93#issuecomment-1776243246, or unsubscribehttps://github.com/notifications/unsubscribe-auth/BCZZJTIGUM3I7XP3MFBXNGDYA4AR5AVCNFSM6AAAAAA5WRK27KVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMYTONZWGI2DGMRUGY. You are receiving this because you were mentioned.Message ID: @.**@.>>

This message may contain confidential information. If you are not the designated recipient, please notify the sender immediately, and delete the original and any copies. Any use of the message by you is prohibited.

lesyngenta commented 1 year ago

Hi Arno,

I downloaded the new codes, but the connection_weight.csv couldn’t be created. I guess the reason is due to this function ‘def create_importance_csv(datapath, model, masks)’. I didn’t create masks file and all the 4 layers are listed in topology.csv. It seems that the create_importance_csv created huge matrix that excess computer memory. Is there any way to modify this function to fit into non-mask cases?

Thanks, Le

From: Arno van Hilten @.> Sent: 2023年10月24日 8:03 To: ArnovanHilten/GenNet @.> Cc: LV Le CNBC @.>; Mention @.> Subject: Re: [ArnovanHilten/GenNet] how to interpret results (Issue #93)

CAUTION: This email originated from outside of the organization. Do not click links or open attachments unless you recognize the sender and know the content is safe.


Hi Le,

I am working on a solution of your problem on this branch: https://github.com/ArnovanHilten/GenNet/tree/epistasis-interpretationhttps://github.com/ArnovanHilten/GenNet/tree/epistasis-interpretation .if you switch to this branch you can test it with GenNet.pyhttp://GenNet.py interpret get_weight_scores. I need some more time to iron out all the bugs, so you can also wait (maybe a week) and then it will work for all networks.

Best,

Arno

— Reply to this email directly, view it on GitHubhttps://github.com/ArnovanHilten/GenNet/issues/93#issuecomment-1776243246, or unsubscribehttps://github.com/notifications/unsubscribe-auth/BCZZJTIGUM3I7XP3MFBXNGDYA4AR5AVCNFSM6AAAAAA5WRK27KVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMYTONZWGI2DGMRUGY. You are receiving this because you were mentioned.Message ID: @.**@.>>

This message may contain confidential information. If you are not the designated recipient, please notify the sender immediately, and delete the original and any copies. Any use of the message by you is prohibited.

lesyngenta commented 1 year ago

Hi Arno,

The new function doesn’t work for it can’t find train_args.json. But train_args.json was never generated.

python GenNet.py interpret -type get_weight_scores -resultpath /scratch-large/4-quarterly/s1198162/GenNet/results/GenNet_experiment28/ Traceback (most recent call last): File "GenNet.py", line 331, in main() File "GenNet.py", line 36, in main interpret(args) File "/scratch-large/4-quarterly/s1198162/GenNet-epistasis-interpretation/GenNet_utils/Interpret.py", line 16, in interpret get_weight_scores(args) File "/scratch-large/4-quarterly/s1198162/GenNet-epistasis-interpretation/GenNet_utils/Interpret.py", line 29, in get_weight_scores model, masks = load_trained_network(args) File "/scratch-large/4-quarterly/s1198162/GenNet-epistasis-interpretation/GenNet_utils/Train_network.py", line 315, in load_trained_network args = load_train_arguments(args) File "/scratch-large/4-quarterly/s1198162/GenNet-epistasis-interpretation/GenNet_utils/Utility_functions.py", line 244, in load_train_arguments with open(args.resultpath + filename, 'r') as file: FileNotFoundError: [Errno 2] No such file or directory: '/scratch-large/4-quarterly/s1198162/GenNet/results/GenNet_experiment28/train_args.json'

From: Arno van Hilten @.> Sent: 2023年10月24日 8:03 To: ArnovanHilten/GenNet @.> Cc: LV Le CNBC @.>; Mention @.> Subject: Re: [ArnovanHilten/GenNet] how to interpret results (Issue #93)

CAUTION: This email originated from outside of the organization. Do not click links or open attachments unless you recognize the sender and know the content is safe.


Hi Le,

I am working on a solution of your problem on this branch: https://github.com/ArnovanHilten/GenNet/tree/epistasis-interpretationhttps://github.com/ArnovanHilten/GenNet/tree/epistasis-interpretation .if you switch to this branch you can test it with GenNet.pyhttp://GenNet.py interpret get_weight_scores. I need some more time to iron out all the bugs, so you can also wait (maybe a week) and then it will work for all networks.

Best,

Arno

— Reply to this email directly, view it on GitHubhttps://github.com/ArnovanHilten/GenNet/issues/93#issuecomment-1776243246, or unsubscribehttps://github.com/notifications/unsubscribe-auth/BCZZJTIGUM3I7XP3MFBXNGDYA4AR5AVCNFSM6AAAAAA5WRK27KVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMYTONZWGI2DGMRUGY. You are receiving this because you were mentioned.Message ID: @.**@.>>

This message may contain confidential information. If you are not the designated recipient, please notify the sender immediately, and delete the original and any copies. Any use of the message by you is prohibited.

lesyngenta commented 1 year ago

Hi Arno,

Through the new function, I could create ‘weight_importance.npy’ but nothing was inside this file. I don’t know how to address that.

Thanks, LE

From: Arno van Hilten @.> Sent: 2023年10月24日 8:03 To: ArnovanHilten/GenNet @.> Cc: LV Le CNBC @.>; Mention @.> Subject: Re: [ArnovanHilten/GenNet] how to interpret results (Issue #93)

CAUTION: This email originated from outside of the organization. Do not click links or open attachments unless you recognize the sender and know the content is safe.


Hi Le,

I am working on a solution of your problem on this branch: https://github.com/ArnovanHilten/GenNet/tree/epistasis-interpretationhttps://github.com/ArnovanHilten/GenNet/tree/epistasis-interpretation .if you switch to this branch you can test it with GenNet.pyhttp://GenNet.py interpret get_weight_scores. I need some more time to iron out all the bugs, so you can also wait (maybe a week) and then it will work for all networks.

Best,

Arno

— Reply to this email directly, view it on GitHubhttps://github.com/ArnovanHilten/GenNet/issues/93#issuecomment-1776243246, or unsubscribehttps://github.com/notifications/unsubscribe-auth/BCZZJTIGUM3I7XP3MFBXNGDYA4AR5AVCNFSM6AAAAAA5WRK27KVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMYTONZWGI2DGMRUGY. You are receiving this because you were mentioned.Message ID: @.**@.>>

This message may contain confidential information. If you are not the designated recipient, please notify the sender immediately, and delete the original and any copies. Any use of the message by you is prohibited.

lesyngenta commented 1 year ago

Close the issue for it was eventually solved. Thanks so much to Arno for the continuous guidance! It is lucky to meet such a responsible author!