BNN-UPC / RouteNet-Fermi

Apache License 2.0
21 stars 4 forks source link

Index issues when using `tf.gather_nd` to aggregate `queue_state` #3

Closed ideafang closed 10 months ago

ideafang commented 10 months ago

Hi Miquel,

At the end of the link and queue to path block of the RouteNet-Fermi, path_state_sequence is updated, and previous_path_state is added in front of each item. At this time, the feature situation in previous_path_sequence[i] should be [p[i], (queue[a]link[x]), ....], where p[i] is the i-th path_state feature, (queue[a]link[x]) is the updated feature from a-th queue and x-th link.

Then, at the beginning of the path to queue block, path_gather uses the index of path_to_queue to aggregate queue features from previous_path_sequence. However, the index sequence number of each queue in path_to_queue has changed due to the addition of path_state_state above (index from path_to_queue[:,:,i] should + 1). Is the original queue feature still aggregated at this time?

And, What is the function of path_state_sequence = tf.concat([tf.expand_dims(previous_path_state, 1), path_state_sequence], axis=1) ?

The code here is a bit complicated for me, please correct me if there is something wrong with my understanding.

Best Regards, Henry Fang

MiquelFerriol commented 10 months ago

Dear Henry, I'll try to explain it in more detail to see if I can clarify it for you. I will explain this with a simple path, but this applies to all of them at once.

Imagine we have a path of length 3 that goes through 3 links and 3 queues, so p_1 = [q_1,l_1,q_2,l_2,q_3,l_3]. Let's start the message passing. Here we will have 6 hidden states. For simplicity, I will use, p_1, q_2, l_1, q_2...

The queue_to_path in this case will be [[1,2,3]] and the link_to_path will also be [[1,2,3]] in this case they are ordered, but we could have a path that follows something like p_2 = [q_1,l_1,q_5,l_7,q_2,l_8], these are simply ids that we define when we build the graph.

Let's start the message passing from links and queues to paths. First, we gather the queue_states using those index tf.gather(queue_state, queue_to_path), so we get queue_gather = [[q_1,q_2,q_3]] and the link_gather = [[l_1,l_2,l_3]], however, we want our sequence to be [q_1,l_1,q_2,l_2,q_3,l_3], so we need a previous operation tf.concat([queue_gather, link_gather], axis=2), to store it like that.

This concatenation then is fed into the path_update_rnn. This will return two outputs, the path_state_sequence, and the path state. The path_state_sequence contains the state of the path after going through the queue/link. For instance, path_state_sequence[0] contains the path state after going through q_1, path_state_sequence[1] contains the path state after going through l_1... And path_state contains the state of the path after going through the last link (in this case l_3) which is the same as path_state_sequence[-1].

Now we go to the path to queue message passing. This is a little bit more complicated since we want not to take the state of the path, but to take the the state of the path before going through this queue. For this, the path_to_queue would be [[[1,0]],[[1,2]],[[1,4]]]. [1,0] -> p_1 position 0 (q_1), [1,2] p_1 position 1 (q_2), [1,4] p_1 position 4 (q_3). However, if you remember, inside path_state_sequence we have the path state AFTER the path went through the queue, and we need the state BEFORE the path went to it. So we can simply add to the path_state_sequence the previous path_state.

Hope this clarifies your questions even if I haven't responded to them directly.

If you have any more questions or you need something to be clarified, do not hesitate to ask again!

Bests, Miquel

ideafang commented 10 months ago

Dear Miquel,

Thank you for your timely reply, but I still have doubts about the path to queue message passing process.

We can return to your example to illustrate.

First of all, the concatenation obtained by tf.concat([queue_gather, link_gather], axis=2) should be [[q_1:l_1], [q_2:l_2], [q_3:l_3]], where [q_1:l_1] means splice q_1 and l_1 in the state dimension. That is, the concat here acts on the feature/state_dim dimensions of queue and link. After path_update_rnn, the result of path_state_sequence should also be [[q_1:l_1]', [q_2:l_2]', [q_3:l_3]'], where [q_1:l_1]' contains the path state after going through q_1 and l_1. In addition, the index in path_to_queue only contains the sequence number of queues in each flow, that is, path_to_queue=[[[1,0]],[[1,1]],[[1,2]]].(not [[[1,0]],[[1,2]],[[1,4]]])

But some of the code below makes me a little confused.

In line 134 of delay_model.py/jitter_model.py/loss_model.py, path_state_sequence = tf.concat([tf.expand_dims(previous_path_state, 1), path_state_sequence], axis=1). path_state_sequence concatenates previous_path_state to the starting position of each path. At this time, the result of path_state_sequence should be updated to [p_1, [q_1:l_1]', [q_2:l_2]', [q_3:l_3]'], where p_1 represents the state of previous_path_state[1] before path_update_rnn. I noticed that in path_state_sequence, the index number of the corresponding queue in each path was moved backward by one(I think this will cause path_to_queue to become the wrong index!!!). Therefore, in path to queue message passing, the queue state obtained according to path_to_queue may not be the state we expect. When path_to_queue is [[[1,0]],[[1,1]],[[1,2]]], [1, 0] -> p_1 position 0 -> (p_1), the obtained It is a state of previous_path_state[1], not q_1.

Looking forward to your reply.

Best Regards, Henry

MiquelFerriol commented 10 months ago

Dear Henry,

First of all, the concatenation obtained by tf.concat([queue_gather, link_gather], axis=2) should be [[q_1:l_1], [q_2:l_2], [q_3:l_3]], where [q_1:l_1] means splice q_1 and l_1 in the state dimension. That is, the concat here acts on the feature/state_dim dimensions of queue and link. After path_update_rnn, the result of path_state_sequence should also be [[q_1:l_1]', [q_2:l_2]', [q_3:l_3]'], where [q_1:l_1]' contains the path state after going through q_1 and l_1. In addition, the index in path_to_queue only contains the sequence number of queues in each flow, that is, path_to_queue=[[[1,0]],[[1,1]],[[1,2]]]. (not [[[1,0]],[[1,2]],[[1,4]]])

You are completely right. I got confused with a new version we are developing that separates queues and links in the path_sequence.

In line 134 of delay_model.py/jitter_model.py/loss_model.py, path_state_sequence = tf.concat([tf.expand_dims(previous_path_state, 1), path_state_sequence], axis=1). path_state_sequence concatenates previous_path_state to the starting position of each path. At this time, the result of path_state_sequence should be updated to [p_1, [q_1:l_1]', [q_2:l_2]', [q_3:l_3]'], where p_1 represents the state of previous_path_state[1] before path_update_rnn. I noticed that in path_state_sequence, the index number of the corresponding queue in each path was moved backward by one(I think this will cause path_to_queue to become the wrong index!!!). Therefore, in path to queue message passing, the queue state obtained according to path_to_queue may not be the state we expect. When path_to_queue is [[[1,0]],[[1,1]],[[1,2]]], [1, 0] -> p_1 position 0 -> (p_1), the obtained It is a state of previous_path_state[1], not q_1.

The fact that the p_1 is concatenated at the starting position is made on purpose. The state we want to use is indeed p_1 since this is the state of the path BEFORE going through q_1 (p_1). Similarly, for q_2, we are interested in getting the state BEFORE q2 and AFTER [q1:l_1] ([q_1:l_1]'). If we do not concatenate the previous path state (p_1) to the sequence, q_1 would receive the information of the path after going through q_1, and that is not 100% correct. We want to get the state of the path before going through the queue we are updating.

Best regards, Miquel

ideafang commented 10 months ago

Dear Miquel,

Thanks for your clear answer, I think I understand the logic here : )

Thanks to you and BNN-UPC for open-sourcing interesting network datasets and codes, which brought my attention and introduced me to the interesting field of network modeling. I am trying to use pytorch to reproduce RouteNet-Fermi so that I can better understand the concept. Of course, if it goes well, I will open source it and conduct further research.

Best regards, Henry