acsicuib / YAFS

Yet Another Fog Simulator (YAFS)
MIT License
101 stars 73 forks source link

Multiple application topology is crashing if a message doesn't have a valid path do to a node removal #20

Closed HenriqueMSilva closed 4 years ago

HenriqueMSilva commented 5 years ago

Hi, I need to simulate multiple users in my topology. For that, I have created a network with multiple applications, and Population(object) instances associated.

Here is an example: git1

Also, I am using a deterministic distribution (0,100) in my Population instances activation_dist input param.

To simulate failures I am removing nodes, but if I remove node 1 in the example the simulation crashes: raise nx.NetworkXNoPath("No path between %s and %s." % (source, target)) NetworkXNoPath: No path between 13 and 2

Meanwhile, if I have the same topology, but only one application, git2 node 1 is removed at env.now == 200 and simulation_time = 300, I can run to completion without problems; the message is simply lost: 2019-09-18 18:15:54,659 - yafs.core - WARNING - The initial path assigned is unreachabled. Link: (13,1). Routing a new one. 200 2019-09-18 18:15:54,659 - yafs.core - DEBUG - No path given. Message is lost 2019-09-18 18:15:54,660 - yafs.core - WARNING - The initial path assigned is unreachabled. Link: (16,1). Routing a new one. 200 2019-09-18 18:15:54,660 - yafs.core - DEBUG - No path given. Message is lost

In this case, if the removing and stopping of the simulation were any longer, it would crash anyways.

I was able to run to completion the first example if I altered the yafs/core.py source.

line 232 WAS: except KeyError: NOW: except:

I couldn't figure out the exact problem but hope this explanation helps.

wisaaco commented 5 years ago

Hello Henrique,

It's a nice infrastructure! Let's try to fix it.

First error

raise nx.NetworkXNoPath("No path between %s and %s." % (source, target)) NetworkXNoPath: No path between 13 and 2

This error is triggered by the Nx library that cannot found both nodes. Without seeing the code, there may be two options. A) a different type-definition of id-nodes in the topology and in other policies. Check if id-nodes in Nx (.G.nodes) are strings or integers, and the same in the _getpath function (in "selection" script) . Both references should be the same.

B) t.G is not a bidirectional graph, but I guess this issue is less improbable.

Second error

2019-09-18 18:15:54,659 - yafs.core - WARNING - The initial path assigned is unreachabled. Link: (13,1). Routing a new one. 200 2019-09-18 18:15:54,659 - yafs.core - DEBUG - No path given. Message is lost 2019-09-18 18:15:54,660 - yafs.core - WARNING - The initial path assigned is unreachabled. Link: (16,1). Routing a new one. 200 2019-09-18 18:15:54,660 - yafs.core - DEBUG - No path given. Message is lost

When there is a failure in the topology, some messages need to change their previously computed path. In this case, the function _get_path_fromfailure is called and internally this function calls to _getpath function. So, the first error is triggered again, but the catch text is different. Both functions are defined in your project ("selection_..py", i.e. _YAFS/src/examples/DynamicFailuresOnNodes/selectionmultipleDeploys.py )

Best, Isaac