tensorflow.python.framework.errors_impl.InvalidArgumentError: Nan in summary histogram for: encoder/dense_1_weights/histogram

wagleswapnil commented 2 years ago

Hi! I am trying to train an encodermap using 2D numpy array input of size (4931, 8670) and I am getting the following error. The error does not seem to resolve by changing the hyperparameters. My matrix is rather sparse, i.e., it has a lot of 0 values and only few non-zero values. Could you please have a look and provide some insight into what I am doing wrong?

Output files are saved to /users/swagle/antonios_project/exp_binding_sites_project/three_hidden_layers_128_each/runs/run17 as defined in 'main_path' in the parameters. 0%| | 0/100 [00:00<?, ?it/s]2022-04-29 15:15:37.685617: W tensorflow/core/framework/cpu_allocator_impl.cc:81] Allocation of 2272788480 exceeds 10% of system memory. 2022-04-29 15:15:39.216105: W tensorflow/core/framework/cpu_allocator_impl.cc:81] Allocation of 2272788480 exceeds 10% of system memory. 2022-04-29 15:15:41.805498: W tensorflow/core/framework/cpu_allocator_impl.cc:81] Allocation of 2272788480 exceeds 10% of system memory. 1%| | 1/100 [00:07<12:38, 7.66s/it] Traceback (most recent call last): File "/users/swagle/anaconda3/envs/encodermap/lib/python3.6/site-packages/tensorflow_core/python/client/session.py", line 1365, in _do_call return fn(*args) File "/users/swagle/anaconda3/envs/encodermap/lib/python3.6/site-packages/tensorflow_core/python/client/session.py", line 1350, in _run_fn target_list, run_metadata) File "/users/swagle/anaconda3/envs/encodermap/lib/python3.6/site-packages/tensorflow_core/python/client/session.py", line 1443, in _call_tf_sessionrun run_metadata) tensorflow.python.framework.errors_impl.InvalidArgumentError: Nan in summary histogram for: encoder/dense_3_weights/histogram [[{{node encoder/dense_3_weights/histogram}}]]

During handling of the above exception, another exception occurred:

Traceback (most recent call last): File "work.py", line 139, in emap.train() File "/users/swagle/anaconda3/envs/encodermap/lib/python3.6/site-packages/encodermap/autoencoder.py", line 249, in train , summary_values = self.sess.run((self.optimize, self.merged_summaries)) File "/users/swagle/anaconda3/envs/encodermap/lib/python3.6/site-packages/tensorflow_core/python/client/session.py", line 956, in run run_metadata_ptr) File "/users/swagle/anaconda3/envs/encodermap/lib/python3.6/site-packages/tensorflow_core/python/client/session.py", line 1180, in _run feed_dict_tensor, options, run_metadata) File "/users/swagle/anaconda3/envs/encodermap/lib/python3.6/site-packages/tensorflow_core/python/client/session.py", line 1359, in _do_run run_metadata) File "/users/swagle/anaconda3/envs/encodermap/lib/python3.6/site-packages/tensorflow_core/python/client/session.py", line 1384, in _do_call raise type(e)(node_def, op, message) tensorflow.python.framework.errors_impl.InvalidArgumentError: Nan in summary histogram for: encoder/dense_3_weights/histogram [[node encoder/dense_3_weights/histogram (defined at /users/swagle/anaconda3/envs/encodermap/lib/python3.6/site-packages/tensorflow_core/python/framework/ops.py:1748) ]]

The parameter file that is being used is as follows:

{ "activation_functions": [ "", "softmax", "softmax", "" ], "analysis_path": "", "auto_cost_scale": 1, "auto_cost_variant": "mean_abs", "batch_size": 256, "center_cost_scale": 0.0001, "checkpoint_step": 5000, "dist_sig_parameters": [ 4.5, 12, 6, 1, 2, 6 ], "distance_cost_scale": 500, "gpu_memory_fraction": 0, "id": "", "l2_reg_constant": 0.001, "learning_rate": 1e-09, "main_path": "/users/swagle/antonios_project/exp_binding_sites_project/three_hidden_layers_128_each/runs/run19", "n_neurons": [ 128, 128, 2 ], "n_steps": 100, "periodicity": Infinity, "summary_step": 1 }

Part of the input file I am using for training the encoder: train = True if train: params = em.Parameters() params.main_path = em.misc.run_path(run_path) params.activation_functions = ["", "softmax", "softmax", ""] params.periodicity = float("inf") params.n_steps = 100 params.summary_step = max(1, params.n_steps / 100) params.learning_rate = 0.000000001 e_map = em.EncoderMap(params, data_arr) e_map.train()

TobiasLe commented 2 years ago

Hi, my best guess is that it is a problem with the input data. I think I had the same error once when I used very larger input values. How larger are the nonzero values? If they are not in a ~ -5 to +5 range I would try to subtract the mean and divide by the standard deviation. That might help because the starting parameters of the network work poorly for values far outside of this range.

wagleswapnil commented 2 years ago

Hi Tobias, The non zero values are in range 1 to 100 (all positive values). Can you please tell me what parameters I need to change to make it work for these values? I would look deeper into your idea of normalizing my dataset (subtracting the mean and dividing by the SD), but I think it probably not be the best for my dataset. Best, Swapnil

TobiasLe commented 2 years ago

Its not so easy to change these. Why do you think it is not good for your data set? I can't think of any reason not to normalize the data.

Swapnil Wagle @.***> schrieb am Fr., 29. Apr. 2022, 22:29:

Hi Tobias, The non zero values are in range 1 to 100 (all positive values). Can you please tell me what parameters I need to change to make it work for these values? I would look deeper into your idea of normalizing my dataset (subtracting the mean and dividing by the SD), but I think it probably not be the best for my dataset. Best, Swapnil

— Reply to this email directly, view it on GitHub https://github.com/AG-Peter/encodermap/issues/14#issuecomment-1113698438, or unsubscribe https://github.com/notifications/unsubscribe-auth/AGMMUUIIBEXTKDK6U53N7X3VHRBBHANCNFSM5UXADP6Q . You are receiving this because you commented.Message ID: @.***>

wagleswapnil commented 2 years ago

Because the elements of the numpy matrix is the number of atoms within a certain cutoff from a reference atom. A row corresponds to a protein, and a column corresponds to a cutoff distance from a reference atom. With the method you described above, the zeros in present matrix will be replaced by negative values! Additionally, the maxima in each row will come down to the same (~5), so the proteins (stored in matrix rows) will have very similar data range despite having large difference in the data (number of atoms within a certain distance from a reference). I am not sure, but probably I will have to take the mean of the entire matrix and not just of each row, but then there might be problem if I test the model on a new protein which has even higher numbers in its row.

TobiasLe commented 2 years ago

Yes, you definitely have to do the same treatment for each data point. Not a normalization per row.

I agree that replacing all the zeros with some negative number feels kind of odd, but I think it would not even be a problem. The algorithm calculates the distances between two points. For the distances it doesn't matter if you move all the points away from zero as lang as you move all points in the same way.

If you want to keep the zeros, you could also just divide your complete array by its maximum. Then keep track of that number and divide all future data points by the same number.

The exact normalization procedure is not so important, just avoid large input values like 100.

AG-Peter / encodermap

tensorflow.python.framework.errors_impl.InvalidArgumentError: Nan in summary histogram for: encoder/dense_1_weights/histogram #14