MarshallAsch / rhpman

An attempt to reimplement the performance evaluation simulation as described by Shi and Chen in their 2014 paper for the RHPMAN data replication scheme.
ISC License
1 stars 0 forks source link

[WiP] Comments & Questions #13

Closed compscidr closed 3 years ago

compscidr commented 3 years ago

Didn't really know where else to do this, so filing an issue for now.

One question regarding the original paper:

One comment on the flowchart:

Code review comments:

The simulation time is set to 2400 s. No data was collected for the first 600 s of the simulations while the initial replicating nodes election process finished

Comments re: tests : To ensure everything is working as expected, Can you construct some tests where you have a small number of nodes and verify the behavior you are expecting holds. For instance, if you have say three nodes, One with a high, one with medium and one with low energy levels set at the start, can you run it for enough time that the election process has occurred, and then assert that the node you expect to be elected replicator is in fact the replicator?

MarshallAsch commented 3 years ago

Just to make sure I'm understanding correctly - are they artificially restricting the degree of connectivity for nodes in the first half of the paper

For section 3 of the paper (Partitioning Pattern Analysis) it seems like they are restricting the degree of connectivity of the nodes by varying the number of nodes (based on equation 3 in the paper)

image

Although they don't mention it, it's the only thing that would make sense to me as they state a fixed area and a fixed communication radius r.

It is unclear in some places with "this node" and "that node" are - consider using "sending node", "receiving node", etc.

Ah I now see what you mean, thats not something that I noticed, Ill be sure to clean that up before I extract snippets to use as figures for the thesis. In almost all cases the node refers to the peer node, and any lists that are mentioned are relating to the current nodes local information.

There are also issues where I have dead end else branches on conditionals that don't go anywhere, I left those out to conserve space and make it less of a mess but @keeferrourke mentioned that it made it confusing and look incomplete.

MarshallAsch commented 3 years ago

I mentioned you in #16 with a potential thoughts on the apparent high level of transmission collisions that seem to be happening.

15 is that mysterious crash I mentioned yesterday, it does not seem to happen frequently (I think it may have been only once or twice in the 3000 simulation combinations that I have run)

compscidr commented 3 years ago

Just to make sure I'm understanding correctly - are they artificially restricting the degree of connectivity for nodes in the first half of the paper

For section 3 of the paper (Partitioning Pattern Analysis) it seems like they are restricting the degree of connectivity of the nodes by varying the number of nodes (based on equation 3 in the paper)

image

Although they don't mention it, it's the only thing that would make sense to me as they state a fixed area and a fixed communication radius r.

It is unclear in some places with "this node" and "that node" are - consider using "sending node", "receiving node", etc.

Ah I now see what you mean, thats not something that I noticed, Ill be sure to clean that up before I extract snippets to use as figures for the thesis. In almost all cases the node refers to the peer node, and any lists that are mentioned are relating to the current nodes local information.

There are also issues where I have dead end else branches on conditionals that don't go anywhere, I left those out to conserve space and make it less of a mess but @keeferrourke mentioned that it made it confusing and look incomplete.

Yes I noticed that formula too - I'm just saying it's weird because it doesn't capture the dynamics of the mobility model and the topology of the actual connectivity. For instance, suppose you get some weird random seed and all of the nodes are bunched together at the top left of the simulation space. You could have it such that a single node might be in communication range of 90% of the nodes in the simulation, say if you have 100 nodes, you could connect to 90 of them. However with that formula it might restrict the node degree to say 5.

Without having looked at your code much yet, I was wondering if you are capturing this dynamic. Are you artificially restricting the nodes to only connect to a subset of the potential neighbours with this restricted connectivity? If so, how do you select which neighbours to use and which to ignore? Do you take the first d neighbours and ignore the rest?

MarshallAsch commented 3 years ago

I am not artificially restricting this, and I do not believe that they did either for the actual experiments. I thought the first part was more "background motivation" to show the partitioning behaviour that is common as a reason to make their algorithm. I don't think it was actually used for the rhpman evaluation.

compscidr commented 3 years ago

Yeah was re-reading now, I think it might have just been for analysis of when the degree was greater or less than some critical value

compscidr commented 3 years ago

Okay I think I've had a decent read through now - made a bunch of comments / questions. Feel free to take your time, we can have a call this week sometime to talk through any of it you like.

MarshallAsch commented 3 years ago

Under assumptions - if the identifier is an ipv4 address, how is the addresses given out? Are you sure there are never ip address conflicts? In a real-world network, typically ad hoc networks use zeroconf and self organize their ip addresses.

I agree, using the ipv4 address is a bad idea for a unique node identifier, using a UUIDv4 would probably make it more robust if this was going to be used for a real system, or tested on an ipv6 network, or under situations where the same ip address could be reused.

However to make this implementation less complex I felt that this limitation was good enough since I am not planning on evaluating the code under those conditions. I am planning on making a note about that assumption in the thesis when I talk about the assumptions and implementation.

https://github.com/MarshallAsch/rhpman/blob/main/examples/simulation-params.cc#L17

Ya I need to fix that, especially since I fo a couple of floating point equality checks, that make it not super clear what is happening.

Also cpp best practice is to use constexpr: https://leanpub.com/cppbestpractices (I can send you a pdf of this book if you're interested)

That would be much appreciated, c++ is not a language I have used at all before starting this, and as you may have noticed a lot of my programming style has been very c like and I have not taken advantage of many of the c++ features / used them incorrectly.

Another best practice - I noticed your enums are not scoped

Interesting I did not know that was a thing, I thought I tried to use the scope for it everywhere, but looking at the diff again it seems I did not.

https://github.com/MarshallAsch/rhpman/blob/main/examples/simulation-params.cc#L100

ah that particular one should have been removed it is not longer usable after I changed the energy model to li-ion batteries.

https://github.com/MarshallAsch/rhpman/blob/main/examples/simulation-params.cc#L96

does not really have a unit, but I shall go through the list of parameters for the cli flags and make sure that all of the relevant units and allowable ranges get listed

I'm not sure the sim time matches the original experiment:

That observation is correct, some of the parameters have been altered for testing purposes so when I do short runs I dont have to wait a long time / pass all the flags, but the time is specified in the simulation runner script that it seems that I have not actually shared with you yet (it is largely still a work in progress).

What happens if lower isn't distance or time https://github.com/MarshallAsch/rhpman/blob/main/examples/nsutil.cc#L62

Very good question, I have not looked through Keefers setup code all that closely, its Keefer so I just kinda trusted it made sense, I should check over it though, thanks for pointing that out.

You may want to note that the weights must sum to 1: https://github.com/MarshallAsch/rhpman/blob/main/examples/simulation-params.cc#L94 - I also didn't see a check for this either. I don't think it makes sense for the sum of the weights to be > 1 does it?

I do check that it adds to up 1 here: (or at least thats what the intention of this block is, yes I see why unit tests would help here) https://github.com/MarshallAsch/rhpman/blob/f939b4c1a05861b832307b7289b5294b775d8529/examples/simulation-params.cc#L287

I'm not sure the colocation is being calculated correctly...

While I agree that based on the line "Instead, π‘ˆcol,𝑖𝑗 summarizes the history of colocation of 𝑖 with receivers of data 𝑗 (in the 𝑖’s disseminating zone). " I am not calculating the value correctly, and based on the name of the field I would have thought that it would make more sense to keep track of historical connections with the replication nodes. Later in that section however it says: "The computation of π‘ˆcol,𝑖𝑗 is much simpler: π‘ˆcol,𝑖𝑗 = 1 if a 𝑗’s receiver is in the 𝑖’s disseminating zone, and π‘ˆcol,𝑖𝑗 = 0, otherwise."

So that is what I used for the calculation, and I think what I did matches that description of the variable.

I also think the degree of connectivity computation is incorrect

Interesting take on the difference between what they said they did (difference in the sets) and the intention, number of connections and disconnections. Do you think I should be following their intention instead of the equation that they gave? This comes up other times in my implementation as well, where the intention of what the authors are tring to do does not seem to quite match up with how they said it was done. I figured that I should be going based off of what they say they did rather than what I think they intend to be doing as much as possible.

In cpp I'd recommend to avoid raw pointers

I have seen things that mentioned to stay away from c style pointers and mallocing things. I need to look into why and how the smart pointers work.

Instead of using a vector and then iterating through it looking for an ID associated with the entry

Yup I thought about that just after I implemented it, that using a vector is a very inefficient way of doing that. Good thing I wrapped the vectors behaviour so the interface doesnt have to change to convert it to a map :)

nit: CheckDuplicateMessage

Im not quite sure what your talking about here, the message IDs that Im using are a statically auto incremented, I am aware that this not a viable real way to generate these IDs and if I were implementing this for real I would probably using a UUIDv4 or a UUIDv5. The purpose of the check duplicate message function is to prevent reprocessing the same message if it has arrived at the same node through multiple routes.

What's the threading situation re: table

I have not implementing anything to be thread safe. My understanding of how ns3 works is that the entire simulation is run in a single thread so it is not something I have thought about for this implementation.

When running repetitions, how is the initial seed set, how are subsequent seeds set

For the simulations I have hardcoded the seed to be 7 for all simulations, and I am making use of the RndRun built in cli flag to configure the sub stream to use for the independant replications as per the ns-3 docs here https://www.nsnam.org/docs/manual/html/random-variables.html#id1

which can support this with a single seed value.

The actual value for RngRun is configured by SEM based on the parameter combinations that are specified and the number of replications that it is told to run.

Are there scripts for generating figures in this repo as well?

As of commit 6f50026 the SEM script to run the simulations and plot some of the results is included within the repo. (note it probably will end up being moved elsewhere)

Comments re: tests, setup small examples

Do you have any suggestions on how one might go about doing that?

I also have previously mentioned that I wanted to write unit tests for my code (because as we all know TDD is a fantastic way to write code to ensure that things do what they are supposed to do). But there are no unit tests. I didnt end up writing any since the ns3 test harness seemed to say every single module was crashing, it turns out my saf module is broken and causes every single test harness to fail.

Now that I know that, I can write tests that will actually pass (or fail) instead of just crashing no matter what. I believe that will also help show that the code does function as intended (at least in small sections)

compscidr commented 3 years ago

Just taking a quick browse of the responses prior to call - looks like most of the questions / suggestions you've either already thought about, or I have an outdated understanding of ns3 (ie: the seed values and such), or didn't notice some additional explanation in the original paper.

I'd say if you still get weird results, try the other way of computing the connectivity as one more thing to try. Most of the rest of my comments are suggestions. I think moving to smart pointers and getting away from the malloc / free / delete stuff will definitely help with that random segfault you were having.

I'll send you an email with that book.

Also, for the tests - check out gtest. On the call I can show you some examples of tests we use with it. You can mock bits out which may make it easier. The test I mentioned as an example is probably more of an integration test because it will need many parts of the system working together - but it could be good to try to write something like that which currently fails as an exercise to see how you might need to change the code, with the eventual goal of making it pass. In the meantime you can also build up unit tests on small pieces and it may help get towards that goal. I think the biggest difficulty you may still have is being confident the simulation is doing what you think it is. The more you can convince yourself of that, the stronger the results and your defense will be.

MarshallAsch commented 3 years ago

Further follow up now that Im actually fixing some of the things:

What happens if lower isn't distance or time This will return result.second = false; which will cause the parameter parsing to fail and the simulation to abort before it starts. std::tie(travellerWalkMode, ok) = getWalkMode(optTravellerWalkMode); the value ok is set to false if any parameter would result in an invalid configuration.

https://github.com/MarshallAsch/rhpman/blob/main/examples/simulation-params.cc#L17 - needs documentation on what it is. Also cpp best practice is to use constexpr

It seems that using fabs() is not allowed in a constexp as an alternative the following will be turned into a function to make it more readable.

bool isEqual(double a, double b) {
    return fabs(b - a) < 0.0001;
}

I noticed your enums are not scoped

I converted the ones I could, there is one left that I did not convert because it seems that the scoped enum version does not play nicely with the ns3 EnumValue class.