ArnovanHilten / GenNet

Framework for Interpretable Neural Networks
Apache License 2.0
91 stars 14 forks source link

@ArnovanHilten new error message #92

Closed lesyngenta closed 1 year ago

lesyngenta commented 1 year ago

Dear Arno,

Eventually I crated the genotype.h5, but when I trained the model, new error message was shown: Traceback (most recent call last): File "GenNet.py", line 296, in main() File "GenNet.py", line 22, in main train_regression(args) File "/scratch-large/4-quarterly/s1198162/GenNet/GenNet_utils/Train_network.py", line 313, in train_regression num_covariates=num_covariates) File "/scratch-large/4-quarterly/s1198162/GenNet/GenNet_utils/Create_network.py", line 288, in create_network_from_csv mask = scipy.sparse.coo_matrix(((matrix_ones), matrix_coord), shape = matrixshape) File "/SD5/people/s1198162/miniforge3/envs/GenNet/lib/python3.7/site-packages/scipy/sparse/coo.py", line 198, in init self._check() File "/SD5/people/s1198162/miniforge3/envs/GenNet/lib/python3.7/site-packages/scipy/sparse/coo.py", line 285, in _check raise ValueError('row index exceeds matrix dimensions') ValueError: row index exceeds matrix dimensions

I have 540 genotypes and ID doesn’t exceed this, do you have any idea what is the reason?

Thanks, Le

lesyngenta commented 1 year ago

Command line: python GenNet.py train -path /scratch-large/4-quarterly/s1198162/GenNet/train/ -ID 1 -epochs 50 -problem_type regression

ArnovanHilten commented 1 year ago

Hi Le,

Can you show or upload the topology.csv? It should not exceed 540. Remember that zero is included in the count so the max should be 539.

Best,

Arno

lesyngenta commented 1 year ago

Hi Arno,

I modified the topology.csv file and now the model structure can be built. However, when start training from scratch, it showed: /var/spool/slurmd/job165654/slurm_script: line 4: 34171 Segmentation fault (core dumped) python GenNet.py train -path /scratch-large/4-quarterly/s1198162/GenNet/train/ -ID 1 -epochs 1000 -problem_type regression

The GPU on our server contains 32G memory and I already decreased the topology.csv as much as possible, with matrix shape (460, 137). How come it can’t be run yet?

Thanks, Le

From: Arno van Hilten @.> Sent: 2023年9月29日 1:49 To: ArnovanHilten/GenNet @.> Cc: LV Le CNBC @.>; Author @.> Subject: Re: [ArnovanHilten/GenNet] @ArnovanHilten new error message (Issue #92)

CAUTION: This email originated from outside of the organization. Do not click links or open attachments unless you recognize the sender and know the content is safe.


Hi Le,

Can you show or upload the topology.csv? It should not exceed 540. Remember that zero is included in the count so the max should be 539.

Best,

Arno

— Reply to this email directly, view it on GitHubhttps://github.com/ArnovanHilten/GenNet/issues/92#issuecomment-1739764564, or unsubscribehttps://github.com/notifications/unsubscribe-auth/BCZZJTPF3UEMMAI3JJ7ULGTX4WZ7FANCNFSM6AAAAAA5I2CPKY. You are receiving this because you authored the thread.Message ID: @.**@.>>

This message may contain confidential information. If you are not the designated recipient, please notify the sender immediately, and delete the original and any copies. Any use of the message by you is prohibited.

lesyngenta commented 1 year ago

Then I shifted to use CPU, the error message was more complicated:

Start training from scratch WARNING:tensorflow:From /scratch-large/4-quarterly/s1198162/GenNet/GenNet_utils/Train_network.py:381: Model.fit_generator (from tensorflow.python.keras.engine.training) is deprecated and will be removed in a future version. Instructions for updating: Please use Model.fit, which supports generators. Epoch 1/50 Traceback (most recent call last): File "GenNet.py", line 296, in main() File "GenNet.py", line 22, in main train_regression(args) File "/scratch-large/4-quarterly/s1198162/GenNet/GenNet_utils/Train_network.py", line 381, in train_regression setsize=val_size_train, inputsize=inputsize, evalset="validation") File "/SD5/people/s1198162/miniforge3/envs/GenNet/lib/python3.7/site-packages/tensorflow/python/util/deprecation.py", line 324, in new_func return func(*args, kwargs) File "/SD5/people/s1198162/miniforge3/envs/GenNet/lib/python3.7/site-packages/tensorflow/python/keras/engine/training.py", line 1479, in fit_generator initial_epoch=initial_epoch) File "/SD5/people/s1198162/miniforge3/envs/GenNet/lib/python3.7/site-packages/tensorflow/python/keras/engine/training.py", line 66, in _method_wrapper return method(self, *args, *kwargs) File "/SD5/people/s1198162/miniforge3/envs/GenNet/lib/python3.7/site-packages/tensorflow/python/keras/engine/training.py", line 848, in fit tmp_logs = train_function(iterator) File "/SD5/people/s1198162/miniforge3/envs/GenNet/lib/python3.7/site-packages/tensorflow/python/eager/def_function.py", line 580, in call result = self._call(args, kwds) File "/SD5/people/s1198162/miniforge3/envs/GenNet/lib/python3.7/site-packages/tensorflow/python/eager/def_function.py", line 644, in _call return self._stateless_fn(*args, **kwds) File "/SD5/people/s1198162/miniforge3/envs/GenNet/lib/python3.7/site-packages/tensorflow/python/eager/function.py", line 2420, in call return graph_function._filtered_call(args, kwargs) # pylint: disable=protected-access File "/SD5/people/s1198162/miniforge3/envs/GenNet/lib/python3.7/site-packages/tensorflow/python/eager/function.py", line 1665, in _filtered_call self.captured_inputs) File "/SD5/people/s1198162/miniforge3/envs/GenNet/lib/python3.7/site-packages/tensorflow/python/eager/function.py", line 1746, in _call_flat ctx, args, cancellation_manager=cancellation_manager)) File "/SD5/people/s1198162/miniforge3/envs/GenNet/lib/python3.7/site-packages/tensorflow/python/eager/function.py", line 598, in call ctx=ctx) File "/SD5/people/s1198162/miniforge3/envs/GenNet/lib/python3.7/site-packages/tensorflow/python/eager/execute.py", line 60, in quick_execute inputs, attrs, num_outputs) tensorflow.python.framework.errors_impl.InvalidArgumentError: Cannot multiply A and B because inner dimension does not match: 460 vs. 12059. Did you forget a transpose? Dimensions of A: [137, 460). Dimensions of B: [32,12059] [[node model/LocallyDirected_0/SparseTensorDenseMatMul/SparseTensorDenseMatMul (defined at /scratch-large/4-quarterly/s1198162/GenNet/GenNet_utils/LocallyDirected1D.py:227) ]] [Op:__inference_train_function_1817]

Errors may have originated from an input operation. Input Source operations connected to node model/LocallyDirected_0/SparseTensorDenseMatMul/SparseTensorDenseMatMul: model/LocallyDirected_0/Reshape (defined at /scratch-large/4-quarterly/s1198162/GenNet/GenNet_utils/LocallyDirected1D.py:223) model/LocallyDirected_0/strided_slice_1 (defined at /scratch-large/4-quarterly/s1198162/GenNet/GenNet_utils/LocallyDirected1D.py:181)

Function call stack: train_function

Closing remaining open files:/scratch-large/4-quarterly/s1198162/GenNet/train//genotype.h5...done Segmentation fault (core dumped)

From: LV Le CNBC Sent: 2023年10月6日 20:08 To: 'ArnovanHilten/GenNet' @.***> Subject: RE: [ArnovanHilten/GenNet] @ArnovanHilten new error message (Issue #92)

Hi Arno,

I modified the topology.csv file and now the model structure can be built. However, when start training from scratch, it showed: /var/spool/slurmd/job165654/slurm_script: line 4: 34171 Segmentation fault (core dumped) python GenNet.py train -path /scratch-large/4-quarterly/s1198162/GenNet/train/ -ID 1 -epochs 1000 -problem_type regression

The GPU on our server contains 32G memory and I already decreased the topology.csv as much as possible, with matrix shape (460, 137). How come it can’t be run yet?

Thanks, Le

From: Arno van Hilten @.**@.>> Sent: 2023年9月29日 1:49 To: ArnovanHilten/GenNet @.**@.>> Cc: LV Le CNBC @.**@.>>; Author @.**@.>> Subject: Re: [ArnovanHilten/GenNet] @ArnovanHilten new error message (Issue #92)

CAUTION: This email originated from outside of the organization. Do not click links or open attachments unless you recognize the sender and know the content is safe.


Hi Le,

Can you show or upload the topology.csv? It should not exceed 540. Remember that zero is included in the count so the max should be 539.

Best,

Arno

— Reply to this email directly, view it on GitHubhttps://github.com/ArnovanHilten/GenNet/issues/92#issuecomment-1739764564, or unsubscribehttps://github.com/notifications/unsubscribe-auth/BCZZJTPF3UEMMAI3JJ7ULGTX4WZ7FANCNFSM6AAAAAA5I2CPKY. You are receiving this because you authored the thread.Message ID: @.**@.>>

This message may contain confidential information. If you are not the designated recipient, please notify the sender immediately, and delete the original and any copies. Any use of the message by you is prohibited.

lesyngenta commented 1 year ago

Dear Arno,

Sorry for all the troubling. As you suggested, the problems were caused by topology.csv. I corrected the error and regenerated that file, now everything goes fine. Thank you so much! I will close the issue.

Best, Le

From: LV Le CNBC Sent: 2023年10月6日 20:27 To: ArnovanHilten/GenNet @.***> Subject: RE: [ArnovanHilten/GenNet] @ArnovanHilten new error message (Issue #92)

Then I shifted to use CPU, the error message was more complicated:

Start training from scratch WARNING:tensorflow:From /scratch-large/4-quarterly/s1198162/GenNet/GenNet_utils/Train_network.py:381: Model.fit_generator (from tensorflow.python.keras.engine.training) is deprecated and will be removed in a future version. Instructions for updating: Please use Model.fit, which supports generators. Epoch 1/50 Traceback (most recent call last): File "GenNet.py", line 296, in main() File "GenNet.py", line 22, in main train_regression(args) File "/scratch-large/4-quarterly/s1198162/GenNet/GenNet_utils/Train_network.py", line 381, in train_regression setsize=val_size_train, inputsize=inputsize, evalset="validation") File "/SD5/people/s1198162/miniforge3/envs/GenNet/lib/python3.7/site-packages/tensorflow/python/util/deprecation.py", line 324, in new_func return func(*args, kwargs) File "/SD5/people/s1198162/miniforge3/envs/GenNet/lib/python3.7/site-packages/tensorflow/python/keras/engine/training.py", line 1479, in fit_generator initial_epoch=initial_epoch) File "/SD5/people/s1198162/miniforge3/envs/GenNet/lib/python3.7/site-packages/tensorflow/python/keras/engine/training.py", line 66, in _method_wrapper return method(self, *args, *kwargs) File "/SD5/people/s1198162/miniforge3/envs/GenNet/lib/python3.7/site-packages/tensorflow/python/keras/engine/training.py", line 848, in fit tmp_logs = train_function(iterator) File "/SD5/people/s1198162/miniforge3/envs/GenNet/lib/python3.7/site-packages/tensorflow/python/eager/def_function.py", line 580, in call result = self._call(args, kwds) File "/SD5/people/s1198162/miniforge3/envs/GenNet/lib/python3.7/site-packages/tensorflow/python/eager/def_function.py", line 644, in _call return self._stateless_fn(*args, **kwds) File "/SD5/people/s1198162/miniforge3/envs/GenNet/lib/python3.7/site-packages/tensorflow/python/eager/function.py", line 2420, in call return graph_function._filtered_call(args, kwargs) # pylint: disable=protected-access File "/SD5/people/s1198162/miniforge3/envs/GenNet/lib/python3.7/site-packages/tensorflow/python/eager/function.py", line 1665, in _filtered_call self.captured_inputs) File "/SD5/people/s1198162/miniforge3/envs/GenNet/lib/python3.7/site-packages/tensorflow/python/eager/function.py", line 1746, in _call_flat ctx, args, cancellation_manager=cancellation_manager)) File "/SD5/people/s1198162/miniforge3/envs/GenNet/lib/python3.7/site-packages/tensorflow/python/eager/function.py", line 598, in call ctx=ctx) File "/SD5/people/s1198162/miniforge3/envs/GenNet/lib/python3.7/site-packages/tensorflow/python/eager/execute.py", line 60, in quick_execute inputs, attrs, num_outputs) tensorflow.python.framework.errors_impl.InvalidArgumentError: Cannot multiply A and B because inner dimension does not match: 460 vs. 12059. Did you forget a transpose? Dimensions of A: [137, 460). Dimensions of B: [32,12059] [[node model/LocallyDirected_0/SparseTensorDenseMatMul/SparseTensorDenseMatMul (defined at /scratch-large/4-quarterly/s1198162/GenNet/GenNet_utils/LocallyDirected1D.py:227) ]] [Op:__inference_train_function_1817]

Errors may have originated from an input operation. Input Source operations connected to node model/LocallyDirected_0/SparseTensorDenseMatMul/SparseTensorDenseMatMul: model/LocallyDirected_0/Reshape (defined at /scratch-large/4-quarterly/s1198162/GenNet/GenNet_utils/LocallyDirected1D.py:223) model/LocallyDirected_0/strided_slice_1 (defined at /scratch-large/4-quarterly/s1198162/GenNet/GenNet_utils/LocallyDirected1D.py:181)

Function call stack: train_function

Closing remaining open files:/scratch-large/4-quarterly/s1198162/GenNet/train//genotype.h5...done Segmentation fault (core dumped)

From: LV Le CNBC Sent: 2023年10月6日 20:08 To: 'ArnovanHilten/GenNet' @.**@.>> Subject: RE: [ArnovanHilten/GenNet] @ArnovanHilten new error message (Issue #92)

Hi Arno,

I modified the topology.csv file and now the model structure can be built. However, when start training from scratch, it showed: /var/spool/slurmd/job165654/slurm_script: line 4: 34171 Segmentation fault (core dumped) python GenNet.py train -path /scratch-large/4-quarterly/s1198162/GenNet/train/ -ID 1 -epochs 1000 -problem_type regression

The GPU on our server contains 32G memory and I already decreased the topology.csv as much as possible, with matrix shape (460, 137). How come it can’t be run yet?

Thanks, Le

From: Arno van Hilten @.**@.>> Sent: 2023年9月29日 1:49 To: ArnovanHilten/GenNet @.**@.>> Cc: LV Le CNBC @.**@.>>; Author @.**@.>> Subject: Re: [ArnovanHilten/GenNet] @ArnovanHilten new error message (Issue #92)

CAUTION: This email originated from outside of the organization. Do not click links or open attachments unless you recognize the sender and know the content is safe.


Hi Le,

Can you show or upload the topology.csv? It should not exceed 540. Remember that zero is included in the count so the max should be 539.

Best,

Arno

— Reply to this email directly, view it on GitHubhttps://github.com/ArnovanHilten/GenNet/issues/92#issuecomment-1739764564, or unsubscribehttps://github.com/notifications/unsubscribe-auth/BCZZJTPF3UEMMAI3JJ7ULGTX4WZ7FANCNFSM6AAAAAA5I2CPKY. You are receiving this because you authored the thread.Message ID: @.**@.>>

This message may contain confidential information. If you are not the designated recipient, please notify the sender immediately, and delete the original and any copies. Any use of the message by you is prohibited.