NetCal / DNC

The NetworkCalculus.org Deterministic Network Calculator
http://dnc.networkcalculus.org
GNU Lesser General Public License v2.1
25 stars 23 forks source link

Storage format for Network.java objects #9

Open sbondorf opened 6 years ago

sbondorf commented 6 years ago

I.e., other than writing Java classes to the file system.

Discussion started in Issue #8, with @matyesz raising the following point

"An idea (please evaluate): we could use xsd to create the network model and generate java code with jaxb. Pros:

xsd has visual editors so you edit the network model as an uml, easy to change xml support out of the box - network can be described both in java or via xml (users have their data in db so it is easier for them to have an xml extra"

sbondorf commented 6 years ago

I do not have any preference regarding the underlying technology. I know that @scriptkitty was in favor of JSON. I briefly read up on JAXB on Wikipedia. To me, the big advantage over JSON-libraries seems to be that "JAXB allows storing and retrieving data in memory in any XML format, without the need to implement a specific set of XML loading and saving routines", i.e., more out-of-the-box functionality.

matyesz commented 6 years ago

Exactly. With jaxb parsing and saving is one line and java code is generated during compile time from xsd model. Do not have to code on the model anymore just draw :). We can even use annotations for existing classes.

sbondorf commented 6 years ago

Currently, we can store and load curves independent of the DiscoDNC configuration. E.g., you can store a network you used with curve backend RTC and load it later with DNC Curves and the Rational BigInteger number backend. This is achieved by storing our String representation for Curves as well as the corresponding parsers for Curves and numbers. I strongly prefer not to compromise on this feature as it proved to be very valuable for comparative evaluations.

If I understand JAXB correctly, we would loose this feature as we then store an XML representation of the actual instances of all the involved classes. Is that correct?

In fact, we currently rather store a NetworkFactory (interface in de.uni_kl.cs.discodnc.network) than a network. I.e., it is quite some overhead when repeatedly stored in every network. A solution to both issues (assuming I am correct about JAXB) could be to have a dedicated network factory class that can load any stored instance and create a new one from it. The new one, of course, being created according to a given configuration.

matyesz commented 6 years ago

No, we can decide what we store in the XML and I want to store only network data, nothing calculation specific. For that we have to first come up with a clear model design. My idea is that we just store pure network description (VLs, switches, paths...) and NetworkFactory will be able to parse and serialize XMLs containing Networks. Besides we can also proivde the API for users to still create the network programatically. We will have to inputs: 1, Network description (XML) 2, Calculator configuration (either command line options or property file) From this two we can create the internal model representation. Is this also your idea?

sbondorf commented 6 years ago

This clear separation sounds good. More thoughts on it in #8's thread. Assuming this separation is implemented, we only need to decide upon the storage format here -- XML, JSON or something else. I do not have any preference as long as the programmatical creation you mention is still featured.

fabgeyer commented 6 years ago

I'd like to pitch in another input for this issue. While having static files is a good approach for static networks like networks existing in real life, having a way to define dynamic networks with specific parameters is quite useful when evaluating algorithms against many networks. This is a relevant use-case for researchers.

One idea is to use a Domain Specific Language (DSL) as a method for defining a network. Thanks to the JVM, various options are already available (eg. Clojure) and could be extended for the use-case of networks. Such approach would enable not only dynamic parameters such as defining the arrival or service curve parameters, but also more generally the network topology for example.

sbondorf commented 6 years ago

I am also constantly dealing with the challenge to generate reasonable networks for evaluation myself. Not restricting to a 1:1 mapping between stored network and network object to evaluate sounds like a nice relief to this problem. For example, I mentioned in #17 that the networks used for one of my publications already come at 37MB in the DiscoDNC v2.4.0 storage format. However, these stored networks are a 1:1 mapping that encode arrival curves, service curves and maximum service curves -- a suboptimal solution we want to improve on.

My current understanding of the ultimate goal is to store resources descriptions (curves) and the network topology in separate files. To instantiate a network, we then require to load the network topology file (only servers, links, flows) and a resource parameter file whose entries can be mapped to the network instances. This mapping creates the network instances and it should, of course, be flexible such that parameter alternatives/ranges given in the resource description leads to multiple network instances. For example, if we want to have a homogeneous network, we should not be required to define the same service / arrive curve for all servers / flows explicitly. Then, we could also use such a generic resource configuration across multiple network topologies.

Now there is a scenario not covered by my current view on this: creation of multiple network topologies from a parameterized description -- I did not think of this, thank you for pointing it out! I think this should, in general, work the same way. I.e., we still need to implement code that creates the actual network instances from the now parameterized description, right? Or does the choice of language help to reduce the required effort? (I cannot comment on advantages of one storage format/language over the other. This is outside my area of expertise, unfortunately.)

matyesz commented 6 years ago

OK.

  1. Let's decide where/what to use the format for...
  2. Decide on the complexity we want to introduce

From the discussion I see that model representation/format can be used for 2 different use-cases:

  1. Researchers
  2. Industry - a lot of companies are searching for a good NC library

While researchers see networks as service and arrival curves, industry does not know too much about this. They see networks based on the protocol (AFDX, TTTethernet...). If we decide to stay on the calculation level arrival and service curve representations are totally OK but for industry type descriptors the protocol and some additional parameters (BAG, bandwidth, frame length) can be expected. Anyway from these lower level elements (arrival and service curves) can be calculated based on a lot of theoretical papers (each protocol has its own). Also on very low level we do not have elements like Virtual Links or Queues. We just have flows with their paths, servers (several servers on each device, on a switch each queue is a server) and each flow has an arrival curve to the next server and each server has its service curve. IMO let's decide which way to go on as I feel that we somehow want to create a mixture of these. Anyway we could also create an extension for the core where we do create Service and Arrival curve implementations based on different protocols (TimeTriggered, AFDX, CAN and FIFO, priority, time triggered queues).

On representation: XML is standard, you can do XML abstracts from databases very quickly. DSLs are usually closed formats that are hard to create. I remember that vector has the special CANDB format that is used by a lot of companies with extensions but there is no standard parser for it causing a lot of problems for all the companies. I also saw network descriptions in excel format at Airbus.

sbondorf commented 6 years ago

In general, I think @matyesz's wrap-up is quite correct -- just the difference between expectations of researchers and industry is often not that strict. I have received emails from academics asking how to use the DiscoDNC. They often stopped considering the tool when they were confronted with deriving service and arrival curves manually. Long story short, a higher layer that allows users put in their familiar network representation potentially benefits many interested parties. Of course, this representation then needs to be converted to the feed-forwardized server graph for analysis with the DiscoDNC. This is basically the brainstorming I intended to spark in #8 (but I am happy about comments here as well :-) ). In #8, there is also this reference [1] where three network abstraction layers are presented. Let me give you a summary:

  1. We start with the output of a topology generator -- vertices are interpreted as output-queueing devices, edges are bidirectional links, we called it the "device graph",
  2. then the device graph it converted to a server graph -- vertices are device's output queues (servers), edges are the turns over the subsequent device to one of its output ports (I think this is the seminal work [3]),
  3. and last we break cycles with the turn prohibition algorithm, see [4]. This gives us the feed-forwardized server graph that can than be analyzed with the DiscoDNC

This is basically the same as presented in @fabgeyer's paper [2]. To get from 1 to 2 to 3 in the publication, we created a glue code that uses instances of our single network object out of its intended server graph context -- that's why it did not make it into a public release yet. Given that the current network object is only intended to store a feed-forwardized server graph, I also intended this issue to only come up with a way to store this low level representation. But it should be extensible to also store the network abstraction layers above when the are properly implemented (issue #8).

About the complexity and thus size of the eventual storage format, this is how I see it evolving in the longer term: Layer 1 need not be the output of a topology generator but can be an AFDX specification, I guess. Conversion from layer 1 to 2 is quite deterministic, yet, feed-forwardizing from layer 2 to 3 need not be done with Turn Prohibition (I think there is a TFA variant that never demultiplexes, i.e., feed-forwardizes by sink-tree conversion). In the most simple case, we still have: 1 device graph -> 1 server graph -> 1 feed-forwardized server graph as in [1]. But with n parameters for the device graph topology (dynamic vertices and edges), m for one homogeneous resource's parameter range (all others fixed), as well as l feed-forwardization alternatives, we can also end up at: 1 device graph -> n^m server graph -> n^m^l feed-forwardized server graph. Having the resource parameter ranges in a separate file allows to dynamically generate the networks at analysis runtime. This gets us down to 1 device graph -> n server graph -> n^l and l should be very small anyways.

Regarding the storage format: We need something that works across the entire stack, from layer 1 to 2 to 3. An open standard would be nice.

To summarize, I think development on the network backend will work its way up from our current layer 3 to layer 1, step by step. Thus, I suggest to start with storing our current single feed-fowrardized server graph network and one separate resource description file with fixed parameters for curves. This can already be extended without progress on #8, e.g., to a resource file with parameter ranges. Having this foundation, the format should, of course, keep up with the development of the network backend in order to store upper layer information as well as meta data about their relations (mapping from devices to servers etc. -- see the labels in [2] for an example).

[1] Iterative Design Space Exploration for Networks Requiring Performance Guarantees (Bruno Cattelan, Steffen Bondorf), In IEEE/AIAA 36th Digital Avionics Systems Conference (DASC 2017), 2017. [2] Performance Evaluation of Network Topologies using Graph-Based Deep Learning (Geyer, Fabien), in EAI ValueTools 2017. [3] The turn model for adaptive routing (Christopher J. Glass and Lionel M. Ni), in ISCA 1992. [4] Application of network calculus to general topologies using turn-prohibition (D. Starobinski, M. Karpovsky, L.A. Zakrevski), IEEE/ACM Transactions on Networking ( Volume: 11, Issue: 3, June 2003 ).

sbondorf commented 6 years ago

Testing ground for storage format options The integration tests have been reworked such that each test network creates a separate instance of the results class (see https://github.com/NetCal/DiscoDNC/issues/15). Can we use this now structure to illustrate the storage options we also envisioned for the network instances? If I understand correctly, JAXB should be able to store/load the results instances as/from XML files quite easily.

matyesz commented 6 years ago

Yes, with JAXB you have to create the xsd schema for the result and the xml that contains the data. Add some maven dependencies and goals to create the Java classes from the schema and everythign else comes automatically. JAXB with maven