BNN-UPC / GNNetworkingChallenge

RouteNet baseline for the Graph Neural Networking Challenge (https://bnn.upc.edu/challenge/)
Apache License 2.0
78 stars 44 forks source link

implementation in the given code #6

Closed craju06 closed 3 years ago

craju06 commented 3 years ago

Hello, I am trying to improve the TensorFlow code using our approach. I have some doubts about the code which you have given as a reference.

  1. Since I am using this code " }, list(nx.get_node_attributes(D_G, 'delay').values())" using delay, do I need to modify in the prediction output?
  2. Whenever I am training all the data files at a time, I am getting errors. Therefore, please help me to overcome this barrier.

With regards, Raju IIT Madras

MiquelFerriol commented 3 years ago

Hi @craju06 , Let's see if we can fix this:

Since I am using this code " }, list(nx.get_node_attributes(D_G, 'delay').values())" using delay, do I need to modify in the prediction output?

No, not really. If you use the 'delay' values, you will already have the desired predictions, so you do not really need to do anything to the output. If you are using some type of regularization/denormalization you may need to denormalize it before submitting the results, but, in principle, you do not need to modify the prediction output.

Whenever I am training all the data files at a time, I am getting errors. Therefore, please help me to overcome this barrier.

Which errors are you getting? Are they related to the model training? The data reading? If you want, you can put here the output logs so we can try to find why you are getting those errors.

Regards, Miquel

craju06 commented 3 years ago

I am getting this kind of error: 2021-09-03 23:13:51.220214: W tensorflow/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libcudart.so.11.0'; dlerror: libcudart.so.11.0: cannot open shared object file: No such file or directory 2021-09-03 23:13:51.223042: I tensorflow/stream_executor/cuda/cudart_stub.cc:29] Ignore above cudart dlerror if you do not have a GPU set up on your machine. 2021-09-03 23:13:54.851455: W tensorflow/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libcuda.so.1'; dlerror: libcuda.so.1: cannot open shared object file: No such file or directory 2021-09-03 23:13:54.853691: W tensorflow/stream_executor/cuda/cuda_driver.cc:269] failed call to cuInit: UNKNOWN ERROR (303) 2021-09-03 23:13:54.853740: I tensorflow/stream_executor/cuda/cuda_diagnostics.cc:156] kernel driver does not appear to be running on this host (raju-HP): /proc/driver/nvidia/version does not exist 2021-09-03 23:13:54.855509: I tensorflow/core/platform/cpu_feature_guard.cc:142] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations: AVX2 FMA To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags. Starting training from scratch... 2021-09-03 23:13:56.013717: I tensorflow/compiler/mlir/mlir_graph_optimization_pass.cc:185] None of the MLIR Optimization Passes are enabled (registered 2) Epoch 1/5 /home/raju/Desktop/python_env_tf/lib/python3.8/site-packages/tensorflow/python/framework/indexed_slices.py:447: UserWarning: Converting sparse IndexedSlices(IndexedSlices(indices=Tensor("gradients/GatherV2_14_grad/Reshape_1:0", shape=(None,), dtype=int32), values=Tensor("gradients/GatherV2_14_grad/Reshape:0", shape=(None, 16), dtype=float32), dense_shape=Tensor("gradients/GatherV2_14_grad/Cast:0", shape=(2,), dtype=int32))) to a dense Tensor of unknown shape. This may consume a large amount of memory. warnings.warn( /home/raju/Desktop/python_env_tf/lib/python3.8/site-packages/tensorflow/python/framework/indexed_slices.py:447: UserWarning: Converting sparse IndexedSlices(IndexedSlices(indices=Tensor("gradients/GatherV2_13_grad/Reshape_1:0", shape=(None,), dtype=int32), values=Tensor("gradients/GatherV2_13_grad/Reshape:0", shape=(None, 32), dtype=float32), dense_shape=Tensor("gradients/GatherV2_13_grad/Cast:0", shape=(2,), dtype=int32))) to a dense Tensor of unknown shape. This may consume a large amount of memory. warnings.warn( /home/raju/Desktop/python_env_tf/lib/python3.8/site-packages/tensorflow/python/framework/indexed_slices.py:447: UserWarning: Converting sparse IndexedSlices(IndexedSlices(indices=Tensor("gradients/GatherV2_12_grad/Reshape_1:0", shape=(None,), dtype=int32), values=Tensor("gradients/GatherV2_12_grad/Reshape:0", shape=(None, 16), dtype=float32), dense_shape=Tensor("gradients/GatherV2_12_grad/Cast:0", shape=(2,), dtype=int32))) to a dense Tensor of unknown shape. This may consume a large amount of memory. warnings.warn( /home/raju/Desktop/python_env_tf/lib/python3.8/site-packages/tensorflow/python/framework/indexed_slices.py:447: UserWarning: Converting sparse IndexedSlices(IndexedSlices(indices=Tensor("gradients/GatherV2_11_grad/Reshape_1:0", shape=(None,), dtype=int32), values=Tensor("gradients/GatherV2_11_grad/Reshape:0", shape=(None, 32), dtype=float32), dense_shape=Tensor("gradients/GatherV2_11_grad/Cast:0", shape=(2,), dtype=int32))) to a dense Tensor of unknown shape. This may consume a large amount of memory. warnings.warn( /home/raju/Desktop/python_env_tf/lib/python3.8/site-packages/tensorflow/python/framework/indexed_slices.py:447: UserWarning: Converting sparse IndexedSlices(IndexedSlices(indices=Tensor("gradients/GatherV2_10_grad/Reshape_1:0", shape=(None,), dtype=int32), values=Tensor("gradients/GatherV2_10_grad/Reshape:0", shape=(None, 16), dtype=float32), dense_shape=Tensor("gradients/GatherV2_10_grad/Cast:0", shape=(2,), dtype=int32))) to a dense Tensor of unknown shape. This may consume a large amount of memory. warnings.warn( /home/raju/Desktop/python_env_tf/lib/python3.8/site-packages/tensorflow/python/framework/indexed_slices.py:447: UserWarning: Converting sparse IndexedSlices(IndexedSlices(indices=Tensor("gradients/GatherV2_9_grad/Reshape_1:0", shape=(None,), dtype=int32), values=Tensor("gradients/GatherV2_9_grad/Reshape:0", shape=(None, 32), dtype=float32), dense_shape=Tensor("gradients/GatherV2_9_grad/Cast:0", shape=(2,), dtype=int32))) to a dense Tensor of unknown shape. This may consume a large amount of memory. warnings.warn( /home/raju/Desktop/python_env_tf/lib/python3.8/site-packages/tensorflow/python/framework/indexed_slices.py:447: UserWarning: Converting sparse IndexedSlices(IndexedSlices(indices=Tensor("gradients/GatherV2_8_grad/Reshape_1:0", shape=(None,), dtype=int32), values=Tensor("gradients/GatherV2_8_grad/Reshape:0", shape=(None, 16), dtype=float32), dense_shape=Tensor("gradients/GatherV2_8_grad/Cast:0", shape=(2,), dtype=int32))) to a dense Tensor of unknown shape. This may consume a large amount of memory. warnings.warn( /home/raju/Desktop/python_env_tf/lib/python3.8/site-packages/tensorflow/python/framework/indexed_slices.py:447: UserWarning: Converting sparse IndexedSlices(IndexedSlices(indices=Tensor("gradients/GatherV2_7_grad/Reshape_1:0", shape=(None,), dtype=int32), values=Tensor("gradients/GatherV2_7_grad/Reshape:0", shape=(None, 32), dtype=float32), dense_shape=Tensor("gradients/GatherV2_7_grad/Cast:0", shape=(2,), dtype=int32))) to a dense Tensor of unknown shape. This may consume a large amount of memory. warnings.warn( /home/raju/Desktop/python_env_tf/lib/python3.8/site-packages/tensorflow/python/framework/indexed_slices.py:447: UserWarning: Converting sparse IndexedSlices(IndexedSlices(indices=Tensor("gradients/GatherV2_6_grad/Reshape_1:0", shape=(None,), dtype=int32), values=Tensor("gradients/GatherV2_6_grad/Reshape:0", shape=(None, 16), dtype=float32), dense_shape=Tensor("gradients/GatherV2_6_grad/Cast:0", shape=(2,), dtype=int32))) to a dense Tensor of unknown shape. This may consume a large amount of memory. warnings.warn( /home/raju/Desktop/python_env_tf/lib/python3.8/site-packages/tensorflow/python/framework/indexed_slices.py:447: UserWarning: Converting sparse IndexedSlices(IndexedSlices(indices=Tensor("gradients/GatherV2_5_grad/Reshape_1:0", shape=(None,), dtype=int32), values=Tensor("gradients/GatherV2_5_grad/Reshape:0", shape=(None, 32), dtype=float32), dense_shape=Tensor("gradients/GatherV2_5_grad/Cast:0", shape=(2,), dtype=int32))) to a dense Tensor of unknown shape. This may consume a large amount of memory. warnings.warn( /home/raju/Desktop/python_env_tf/lib/python3.8/site-packages/tensorflow/python/framework/indexed_slices.py:447: UserWarning: Converting sparse IndexedSlices(IndexedSlices(indices=Tensor("gradients/GatherV2_4_grad/Reshape_1:0", shape=(None,), dtype=int32), values=Tensor("gradients/GatherV2_4_grad/Reshape:0", shape=(None, 16), dtype=float32), dense_shape=Tensor("gradients/GatherV2_4_grad/Cast:0", shape=(2,), dtype=int32))) to a dense Tensor of unknown shape. This may consume a large amount of memory. warnings.warn( /home/raju/Desktop/python_env_tf/lib/python3.8/site-packages/tensorflow/python/framework/indexed_slices.py:447: UserWarning: Converting sparse IndexedSlices(IndexedSlices(indices=Tensor("gradients/GatherV2_3_grad/Reshape_1:0", shape=(None,), dtype=int32), values=Tensor("gradients/GatherV2_3_grad/Reshape:0", shape=(None, 32), dtype=float32), dense_shape=Tensor("gradients/GatherV2_3_grad/Cast:0", shape=(2,), dtype=int32))) to a dense Tensor of unknown shape. This may consume a large amount of memory. warnings.warn( /home/raju/Desktop/python_env_tf/lib/python3.8/site-packages/tensorflow/python/framework/indexed_slices.py:447: UserWarning: Converting sparse IndexedSlices(IndexedSlices(indices=Tensor("gradients/GatherV2_2_grad/Reshape_1:0", shape=(None,), dtype=int32), values=Tensor("gradients/GatherV2_2_grad/Reshape:0", shape=(None, 16), dtype=float32), dense_shape=Tensor("gradients/GatherV2_2_grad/Cast:0", shape=(2,), dtype=int32))) to a dense Tensor of unknown shape. This may consume a large amount of memory. warnings.warn( /home/raju/Desktop/python_env_tf/lib/python3.8/site-packages/tensorflow/python/framework/indexed_slices.py:447: UserWarning: Converting sparse IndexedSlices(IndexedSlices(indices=Tensor("gradients/GatherV2_1_grad/Reshape_1:0", shape=(None,), dtype=int32), values=Tensor("gradients/GatherV2_1_grad/Reshape:0", shape=(None, 32), dtype=float32), dense_shape=Tensor("gradients/GatherV2_1_grad/Cast:0", shape=(2,), dtype=int32))) to a dense Tensor of unknown shape. This may consume a large amount of memory. warnings.warn( /home/raju/Desktop/python_env_tf/lib/python3.8/site-packages/tensorflow/python/framework/indexed_slices.py:447: UserWarning: Converting sparse IndexedSlices(IndexedSlices(indices=Tensor("gradients/GatherV2_grad/Reshape_1:0", shape=(None,), dtype=int32), values=Tensor("gradients/GatherV2_grad/Reshape:0", shape=(None, 16), dtype=float32), dense_shape=Tensor("gradients/GatherV2_grad/Cast:0", shape=(2,), dtype=int32))) to a dense Tensor of unknown shape. This may consume a large amount of memory. warnings.warn( 50/50 [==============================] - ETA: 0s - loss: 88.6356 - MAPE: 88.6356Traceback (most recent call last): File "/media/raju/All Documents/GNNetworkingChallenge-2021_Routenet_TF/code/main.py", line 109, in model.fit(ds_train, File "/home/raju/Desktop/python_env_tf/lib/python3.8/site-packages/keras/engine/training.py", line 1215, in fit val_logs = self.evaluate( File "/home/raju/Desktop/python_env_tf/lib/python3.8/site-packages/keras/engine/training.py", line 1501, in evaluate tmp_logs = self.test_function(iterator) File "/home/raju/Desktop/python_env_tf/lib/python3.8/site-packages/tensorflow/python/eager/def_function.py", line 885, in call result = self._call(*args, **kwds) File "/home/raju/Desktop/python_env_tf/lib/python3.8/site-packages/tensorflow/python/eager/def_function.py", line 956, in _call return self._concrete_stateful_fn._call_flat( File "/home/raju/Desktop/python_env_tf/lib/python3.8/site-packages/tensorflow/python/eager/function.py", line 1963, in _call_flat return self._build_call_outputs(self._inference_function.call( File "/home/raju/Desktop/python_env_tf/lib/python3.8/site-packages/tensorflow/python/eager/function.py", line 591, in call outputs = execute.execute( File "/home/raju/Desktop/python_env_tf/lib/python3.8/site-packages/tensorflow/python/eager/execute.py", line 59, in quick_execute tensors = pywrap_tfe.TFE_Py_Execute(ctx._handle, device_name, op_name, tensorflow.python.framework.errors_impl.InvalidArgumentError: Incompatible shapes: [275,32] vs. [0,32] [[{{node route_net_model/StatefulPartitionedCall/rnn/gru_cell_1/add_1}}]] [Op:__inference_test_function_13464]

Function call stack: test_function

MiquelFerriol commented 3 years ago

Some comments on this:

Note that W represents a warning. As an example, this one is saying that TF cannot find a valid Cuda installation:

2021-09-03 23:13:51.220214: W tensorflow/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libcudart.so.11.0'; dlerror: libcudart.so.11.0: cannot open shared object file: No such file or directory

This one is also not an error:

/home/raju/Desktop/python_env_tf/lib/python3.8/site-packages/tensorflow/python/framework/indexed_slices.py:447: UserWarning: Converting sparse IndexedSlices(IndexedSlices(indices=Tensor("gradients/GatherV2_14_grad/Reshape_1:0", shape=(None,), dtype=int32), values=Tensor("gradients/GatherV2_14_grad/Reshape:0", shape=(None, 16), dtype=float32), dense_shape=Tensor("gradients/GatherV2_14_grad/Cast:0", shape=(2,), dtype=int32))) to a dense Tensor of unknown shape. This may consume a large amount of memory.

It is a warning that is caused due to the gather function that needs to convert a Tensor of an unknown shape to dense and this may consume a large amount of memory.

However, the error is raised here:

File "/media/raju/All Documents/GNNetworkingChallenge-2021_Routenet_TF/code/main.py", line 109, in model.fit(ds_train, File "/home/raju/Desktop/python_env_tf/lib/python3.8/site-packages/keras/engine/training.py", line 1215, in fit val_logs = self.evaluate( File "/home/raju/Desktop/python_env_tf/lib/python3.8/site-packages/keras/engine/training.py", line 1501, in evaluate tmp_logs = self.test_function(iterator) File "/home/raju/Desktop/python_env_tf/lib/python3.8/site-packages/tensorflow/python/eager/def_function.py", line 885, in call result = self._call(*args, **kwds) File "/home/raju/Desktop/python_env_tf/lib/python3.8/site-packages/tensorflow/python/eager/def_function.py", line 956, in _call return self._concrete_stateful_fn._call_flat( File "/home/raju/Desktop/python_env_tf/lib/python3.8/site-packages/tensorflow/python/eager/function.py", line 1963, in _call_flat return self._build_call_outputs(self._inference_function.call( File "/home/raju/Desktop/python_env_tf/lib/python3.8/site-packages/tensorflow/python/eager/function.py", line 591, in call outputs = execute.execute( File "/home/raju/Desktop/python_env_tf/lib/python3.8/site-packages/tensorflow/python/eager/execute.py", line 59, in quick_execute tensors = pywrap_tfe.TFE_Py_Execute(ctx._handle, device_name, op_name, tensorflow.python.framework.errors_impl.InvalidArgumentError: Incompatible shapes: [275,32] vs. [0,32] [[{{node route_net_model/StatefulPartitionedCall/rnn/gru_cell_1/add_1}}]] [Op:__inference_test_function_13464]

It looks like during evaluation, some empty graph or some data is missing. As can be seen, the error is saying that the shapes that are fed into the GRU Cell are incompatible ([275,32] vs. [0,32]). So it looks like the GRU Cell is receiving an empty array. Have you modified something in our baseline? Maybe the data input? It is weird that the training is working while the evaluation is not.

Bests, Miquel

craju06 commented 3 years ago

I have changed some code in read_dataset.py file. When I have run this code with the sample dataset, it doesn't show any error.

craju06 commented 3 years ago

Do I need to change anything if I am working with Queue occupancy?

jsuarezv commented 3 years ago

Dear Raju, Thank you for your interest. It depends on how you plan to encode queue occupancy in the model. As a starting point, you can follow the "How to" guide we provide in the README file: https://github.com/BNN-UPC/GNNetworkingChallenge/blob/2021_Routenet_TF/README.md

Regards, José

craju06 commented 3 years ago

I have already followed the information provided in the README file. I am just saying that, in the case of model prediction, do I need to change the predictions function?

jsuarezv commented 3 years ago

Hello, As indicated in the README, to work with queue occupancy you should first do the following. Modify the read_dataset.py and change this line: }, list(nx.get_node_attributes(D_G, 'delay').values()) To this one: }, list(nx.get_node_attributes(D_G, 'occupancy').values())

Then, you would need to modify the model architecture in the routenet_model.py script to produce queue occupancy.

How you make this modification is actually part of the challenge, as you would need to design an architecture that better suits the proposed problem.

craju06 commented 3 years ago

I have already changed the following line which you're mentioning. I have modified routenet_model.py as per the queue occupancy. I am just worried about the prediction in main.py file, which are given in the bottom of that file.

jsuarezv commented 3 years ago

This is an automatic function of TensorFlow, so that it will produce the predictions according to the output of the model definition (i.e., routenet_model.py).

Then, if you want to infer path delays you will need to make some post-processing to infer them from the queue occupancy values predicted by the model.

I hope this helps.