[RFC] Performance testing using Conjure and Conjure-Oxide

This is some basic documentation outlining the rough plan of how performance testing is being approached as part of the project. This will help me keep track of where along this goal I am, and generally just mean that work I'm doing doesn't just exist in a vacuum of my own notes. Most of these notes have come following meetings with Ty and Oz.

Please feel free to comment anything I might have missed or any ways this could be improved.

Overview

Both conjure and conjure oxide produce a stats json file when a test is run, as shown below (this is from conjure-oxide/conjure-oxide/tests/integration/basic/div/01/input.essence as an example, but this file is produced on all successful test runs). It it produced from save_stats_json in testing.rs, which is imported into and called from generated_tests.rs and rewrite_tests.rs.

{
"extraRuleSetNames": [],
"fileName": "tests/integration/basic/div/01/input.essence",
"stats": {
"rewriterRuns": [
  {
    "isOptimizationEnabled": false,
    "rewriterRuleApplicationAttempts": 555,
    "rewriterRuleApplications": 6,
    "rewriterRunTime": {
      "nanos": 959090,
      "secs": 0
    }
  }
],
"solverRuns": [
  {
    "conjureSolverWallTime_s": 0.148026588,
    "nodes": 7,
    "solverAdaptor": "Minion",
    "solverFamily": "Minion"
  }
]
}
}

Though there are minor differences in the output (for example Conjure output includes a status), both stats include rewrite run time (in the case of Conjure this is SavileRowTotalTime), solver time, total time, and nodes used for solving.
While the stats file obviously contains other information, when considering what is important in performance, these four traits as above are the most important. Of these, possibly the most important is nodes - this is because on a simple essence problem, conjure and conjure-oxide may have similar overall times, but if the number of nodes is drastically different, this inefficiency will show up more prominently with complex problems.
Also, with nodes, these should be consistent regardless of how many times a problem is ran to be solved. It should always consistently use the same number of nodes, where solver or rewrite time may vary on each run. As such, this is an important measure of efficiency between the two implementations
The aim of this project is primarily to run tests and compare the solutions of conjure and conjure-oxide for correctness, and then to compare specific elements of the stats/info files for differences. A table or graph based representation may be produced at some point for comprehension purposes, and there is a general (not currently part of the sub-project I am working on) aim to make these tests runnable through GitHub actions.
One primary consideration is to see whether rewrite time is reduced between conjure and conjure-oxide. At some future point of testing, the aim will be to find where conjure-oxide begins to lag behind conjure, and to see why this is. That is, however, likely for a future issue.

Details

The basis of the implementation of this is founded from get_solutions_from_conjure, which Nik implemented a few weeks ago. Rather than grabbing the stats JSON file generated from the test run (as will be required in my implementation), it instead selects the solutions, as shown in a (not complete) snippet of the method below:

#[allow(clippy::unwrap_used)]
pub fn get_solutions_from_conjure(
    essence_file: &str,
) -> Result<Vec<HashMap<Name, Literal>>, EssenceParseError> {
    // this is ran in parallel, and we have no guarantee by rust that invocations to this function
    // don't share the same tmp dir.
    let mut rng = rand::thread_rng();
    let rand: i8 = rng.gen();

    let mut tmp_dir = std::env::temp_dir();
    tmp_dir.push(Path::new(&rand.to_string()));

    let mut cmd = std::process::Command::new("conjure");
    let output = cmd
        .arg("solve")
        .arg("--output-format=json")
        .arg("--solutions-in-one-file")
        .arg("--number-of-solutions=all")
        .arg("--copy-solutions=no")
        .arg("-o")
        .arg(&tmp_dir)
        .arg(essence_file)
        .output()
        .map_err(|e| EssenceParseError::ConjureSolveError(e.to_string()))?;

    if !output.status.success() {
        return Err(EssenceParseError::ConjureSolveError(format!(
            "conjure solve exited with failure: {}",
            String::from_utf8(output.stderr).unwrap()
        )));
    }

    let solutions_files: Vec<_> = glob(&format!("{}/*.solutions.json", tmp_dir.display()))
        .unwrap()
        .collect();

Currently, this implementation takes in an essence file and runs conjure solve --output-format=json --solutions-in-one-file --number-of-solutions=all --copy-solutions=no -o [temporary directory] [input essence_file]. This command calls the conjure solution on a specific essence file and outputs all of the results of this into a temporary directory. The arguments just ensure that the output is returned in the same format as conjure-oxide (for example conjure-oxide returns all solutions, so the arguments must specify that conjure returns all arguments to ensure correctness). The current implementation can be either used as inspiration or directly modified to return the stats files for comparison.

Basic Steps

Modify/use Nik's get_solutions_from_conjure to get stats files from conjure.
- A struct will likely be used when reading the JSON to only gather the very specific intended results
- If the initial method is modified rather than a new one created, then it is likely that the return type will be a tuple, such as (demonstrative code, not actually checked for functionality) Result<(Vec<HashMap<Name, Constant>>,Optional STATSFILE), EssenceParseError>
Modify integrated_test and integrated_test_inner to make use of this modification to get_solutions_from_conjure
- Possibly implement a specific test library to performance test - this could be done by giving a different extension, as seen in .disabled, or by creating a new directory of performance tests.
Use these modified implementations to
- Gather conjure stats for a specific essence problem
- Gather conjure-oxide stats for the exact same essence problem
- Compare these two for the given values (rewriter time, solver time, nodes, total time) and flag any major disparity (emphasis on flagging when nodes is different, specifically when it is higher in conjure-oxide).

There is some degree to which running problems multiple times to generate a more accurate solver and rewrite time relationship becomes inefficient when the problems become sufficiently complex, hence the interest in nodes. I am not sure to what degree the conjure-oxide implementation is currently capable of solving these more complex problems at present, so I'm not sure if this specific concern is relevant as it stands.

Any comments on this would be welcome! If I've missed any major parts of this explanation, please let me know :). A draft PR will be made soon to start the implementation of this.

conjure-cp / conjure-oxide