NetCal / DNC

The NetworkCalculus.org Deterministic Network Calculator
http://dnc.networkcalculus.org
GNU Lesser General Public License v2.1
25 stars 23 forks source link

Issue with thread-safety #76

Open fabgeyer opened 5 years ago

fabgeyer commented 5 years ago

I have some code using the DNC library where I perform analyses of multiple networks in different threads. This leads to errors in the analyses which do not appear when only one thread is used.

Some debugging lead me to ArrivalBoundDispatch.java and the arrivalbound classes. Singletons are used for doing the computations, and use setServerGraph(server_graph) and setConfiguration(configuration) to configure it, before calling computeArrivalBound. The issue is that if there are multiple threads, you might have the case that setServerGraph is executed by one thread, while another thread is still executing computeArrivalBound. I validated my theory by creating a new arrivalbound instance for each computation, resulting in no errors. ArrivalBoundCache, in its current implementation, is also not thread-safe.

Hence, I suggest to modify DNC library in order to use thread-safe data structures, and thus enable parallelization of (some) computations.

matyesz commented 5 years ago

First of all, thanks for the feedback, we are happy to see the first users. When we were creating v2.5 we recognized that singleton usage. Fortunatelly in none of the docs we mention that the code is thread-safe. Be aware that the config itself is singleton, so you can compute only with the same config on different threads. I do not know whether we plan any parallelism at the moment, would leave that to Steffen.

sbondorf commented 5 years ago

To me, the wrap-up of the problem is: Having an internal state in a singleton is very error prone.

There are already thoughts on my side, inspired by the way we moved Calculator.java "to the top-most level" in v2.5. It probably requires a considerable rework:

matyesz commented 5 years ago

Agreed. We should remove those singletons and take a task to check classes for thread-safety. Hopefully no other issues.

matyesz commented 5 years ago

The problem as I see after checking in ArrivalBound Dispatch.computeArrivalBounds: not thread safe -> case AGGR_PBOO_PER_SERVER: AggregatePboo_PerServer aggr_pboo_per_server = AggregatePboo_PerServer.getInstance(); aggr_pboo_per_server.setServerGraph(server_graph); aggr_pboo_per_server.setConfiguration(configuration); arrival_bounds_tmp = aggr_pboo_per_server.computeArrivalBound(turn, flows_to_bound, flow_of_interest); break;

Same switch but thread-safe: case SEGR_PBOO: for (Flow flow : flows_to_bound) { SeparateFlowAnalysis sfa = new SeparateFlowAnalysis(server_graph); sfa.performAnalysis(flow, flow.getSubPath(flow.getSource(), turn.getSource())); arrival_bounds_tmp = getPermutations(arrival_bounds_tmp, singleFlowABs(configuration, flow.getArrivalCurve(), sfa.getLeftOverServiceCurves())); }

Why do we have 2 approaches? What is the difference between the 2? I do not see how AnalysisConfig would solve the problem of getInstance approach, here I would just switch to new instances for each call. I do not understand as point of design why getinstance is good here.

sbondorf commented 5 years ago

These are two different ways to compute an arrival curve bounding the arrivals of flows in flows_to_bound coming via turn. The former tries to aggregate flows as much as possible but to do so, it cannot analyze these flows in an end-to-end (end-to-point of interference) fashion. The latter does things vice versa.

Employing only the PBOO principle, the SEGR_PBOO alternative is always worse. See

@inproceedings{BS15-2,
  author    = {Steffen Bondorf and Jens B. Schmitt},
  title     = {Calculating Accurate End-to-End Delay Bounds – You Better Know Your Cross-Traffic},
  booktitle = {Proceedings of the 9th International Conference on Performance Evaluation Methodologies and Tools (VALUETOOLS 2015)},
  year      = {2015}
}

When we can make use of PMOO,, however, the SEGR_PMOO alternative can be better in some corner cases. See

@inproceedings{BNS18,
    Author = {Steffen Bondorf and Paul Nikolaus and Jens B. Schmitt},
    Booktitle = {Proceedings of 19th International GI/ITG Conference on Measurement, Modelling and Evaluation of Computing Systems (MMB)},
    Title = {{Catching Corner Cases in Network Calculus -- Flow Segregation Can Improve Accuracy}},
    Year = {2018}
}

As to why the design is inconsistent w.r.t. thread-safety, the answer is simply because the code grew organically with the understanding of this matter and thread-safety had never been in the focus. Also, the tool was never (re-)built based on a full insight on the topic (maay it be the best computation or doing them in a thread-safe way). In particular, for the AGGR_* arrival bounds, at the beginning hopes were high that we just need to call the static computation of a tandem left-over service curve. But then the backtracking of flow aggregates had to be add in between the dispatcher and the left-over service curve computation. That intermediate layer, in turn, needed an internal state (in contrast to the SEGR_* code) and all this led to today's code.

For the proposed new Analysis.java class I intended to have it store a single config and one server graph only. Additionally, it can store instances of arrival bounding methods. When the Analysis.java object's config or server graph changes, then the stored arrival bounding instances get updated (above: AggregatePboo_PerServer.getInstance(); and aggr_pboo_per_server.setServerGraph(server_graph);). The switch-case statement in the dispatcher will not query the list of enum items in AnalysisConfig anymore. Instead it will iterate over the instances arrival bounding in the Analysis.java object and that them with the current argument list turn, flows_to_bound, flow_of_interest. These adaptations make different Analysis.java object independent from each other; they cannot interfere with each other by changing a singleton's state anymore, right? Last, in order to really offer a thread-safe version, we need to lock the Analysis.java object for the time the computations are executed in order to prevent a modifying anything half-way through an analysis.

matyesz commented 5 years ago

In my opinion the analysisconfig will not solve the issue as the problem is you update the references/values in a a class that has only one instance in the memory while another thread is using it. The solution is to remove singleton and create new instances with each calculation as it is done for the second example I gave. That is why I asked whether it was a must to use singleton in the first one. Singleton classes with instance variables are not thread-safe.

sbondorf commented 5 years ago

Sure, the singleton will be removed (sorry if that wasn't clearly stated). Creating a new instance every time we reach this location in the code introduces unnecessary overhead, too. I think one instance per Analysis object is sufficient because the concurrent modifications of the singleton come from different analysis configurations. So I propose to embed the instance the an analysis uses into it and shield it from the other potential analysis instances. The same holds for the arrival bounds cache that I intend to merge into the v2.6-dev branch soon.