LNST-project / lnst

Linux Network Stack Test
GNU General Public License v2.0
73 stars 33 forks source link

New `RecipeRun` export format or other import/export alternatives. #320

Closed olichtne closed 1 year ago

olichtne commented 1 year ago

The lrc recipe run import/export functionality works perfectly for single run situations where we want to do a deep inspection of everything that happened during the recipe execution. This includes use cases where we need to inspect exceptions raised or details about measured data etc.

However, a lot of our internal use cases work with a large number of test runs - aggregating data, analyzing large sets of tests for failures and so on.

Importing many lrc files has a very large time and memory footprint often times leading to OOM errors in our automation vms that have some resource restrictions.

As such it would be interesting to experiment with other import/export methods that are more suitable for situations where large amounts of recipe runs are expected to be analyzed.

There are two main ideas we can work with and that can coexist. At the same time neither one of them replaces the usefulness of the full lrc exporting which we think should stay as is for the time being:

  1. alternative export format e.g. json that doesn't replicate the entire process memory structure. My first proposal for this is:
    • Implement a BaseRunRecipeFormatter class that defines the common interface of run recipe formatter classes, which should most likely be just the https://github.com/LNST-project/lnst/blob/master/lnst/Controller/RunSummaryFormatter.py#L84 format_run method of the summary formatter at the moment
    • implement a new JsonFormatter class that implements the json serialization of individual Result classes defined in: https://github.com/LNST-project/lnst/blob/master/lnst/Controller/RecipeResults.py where we'd serialize the basic data and additionally all of the simple data types that are available from the data properties
    • this should be a relatively simple way to export machine readable json data that should be much simpler for importing, processing only the relevant parts and then cleaning up the unnecessary data that could take a lot of space. The main benefit here is that there are no "back references" to objects which we get with the lrc files that make it complicated to import just "part" of the data that we're interested in.
  2. import an alternative way to import lrc files - instead of importing the entire lrc file to which we provide an abstract interface via the lnst-project/lrc-file project we create a different EXPLICIT importer, that on import reads the entire lrc file and returns a very specific subset of data that we're interested in for the specific application that we want then cleans up everything. This is IMO a rather much more difficult ordeal as this "return a very specific explicit subset of data" idea is already kind of what the lrc files are TRYING to do however due to the various numbers of references that the lnst objects have between each other we often time pull in other data as well by accident. As such this "importer" would need to be done in a very deliberate way that probably explicitly copies some data points into new data structures without preserving any of the references or by creating new ones where wanted. At the same time which data is interesting is application specific so it could be quite difficult to implement something that is common enough to share in upstream.

@Kuba314 was interested in this write up, let me know if this contains all of the info that we've discussed or if there are still some questions or answers that i forgot about.

Kuba314 commented 1 year ago

Everything is in there. I'll start with option 1 and go from there.