Have better strategy of merging spec durations

Hey

I ran into a couple of issues with merging spec durations:

The durations from each machine are written out with all the other spec durations, including those that haven't changed in this run, even though the plugin prints out to the console just the durations for a specific run.
The "average" strategy is not always optimal
1. If you're merging 3 different files where only the middle one has a change in a certain spec duration then it won't actually be the average of the "two" different durations. It's actually a division by 4 not by 2. It can happen because of issue No.1, because it writes all the durations for all specs even for those that haven't changed, so when merging using the merge utility, it will see the same duration for a given spec for all the machines it didn't run on
2. Sometimes (I would probably say most times), you don't want an average but rather just take the most recent one. Using the average means that the duration approaches the runtime of the spec but it can take a while. Using the most recent one (with a threshold defined) is roughly the same but with one single "jump". That lack of "memory" might be a good thing, if for instance, you added/removed a bunch of tests from a spec file, this would have a significant impact on the runtime and using the average it would take a bunch of updates till it actually reaches a reasonably close runtime.

I'll elaborate on each:

1 - All spec durations are written out.

Here's an example of what the plugin prints to the console:

09:27:41  cypress-split: here are passing spec timings
09:27:41  {
09:27:41    "durations": [
09:27:41      {
09:27:41        "spec": "cypress/e2e/A.spec.ts",
09:27:41        "duration": 111045
09:27:41      },
09:27:41      {
09:27:41        "spec": "cypress/e2e/B.spec.ts",
09:27:41        "duration": 33589
09:27:41      },
09:27:41      {
09:27:41        "spec": "cypress/e2e/C.spec.ts",
09:27:41        "duration": 38835
09:27:41      },
09:27:41      {
09:27:41        "spec": "cypress/e2e/D.spec.ts",
09:27:41        "duration": 9878
09:27:41      }
09:27:41    ]
09:27:41  }

While infact the after it prints this:

cypress-split: writing out updated timings file /cypress-timings-machine-2.json

You'd see something like this:

{
  "durations": [
    {
      "spec": "cypress/e2e/A.spec.ts",
      "duration": 111045
    },
    {
      "spec": "cypress/e2e/B.spec.ts",
      "duration": 33589
    },
    {
      "spec": "cypress/e2e/C.spec.ts",
      "duration": 38835
    },
    {
      "spec": "cypress/e2e/D.spec.ts",
      "duration": 3409
    },
    {
      "spec": "cypress/e2e/E.spec.ts",
      "duration": 88930
    },
    {
      "spec": "cypress/e2e/F.spec.ts",
      "duration": 234441
    },
    {
      "spec": "cypress/e2e/G.spec.ts",
      "duration": 93823
    }
  ]
}

Even though specs E-G haven't run on machine No.2. This isn't a problem by itself, but this causes an issue when merging the timings from different machines using the average strategy.

2 - Average strategy is not always optimal

2.1 - "Average" is not exactly the average

Simplifying the above example, let's assume we're running our tests on 3 machines, these would be the 3 written cypress-timings-machine-*.json files:

cypress-timings-machine-0.json:

{
  "durations": [
    {
      "spec": "cypress/e2e/A.spec.ts",
      "duration": 108745
    },
    {
      "spec": "cypress/e2e/B.spec.ts",
      "duration": 33589
    },
    {
      "spec": "cypress/e2e/C.spec.ts",
      "duration": 38835
    }
  ]
}

cypress-timings-machine-1.json:

{
  "durations": [
    {
      "spec": "cypress/e2e/A.spec.ts",
      "duration": 111045
    },
    {
      "spec": "cypress/e2e/B.spec.ts",
      "duration": 35123
    },
    {
      "spec": "cypress/e2e/C.spec.ts",
      "duration": 38835
    }
  ]
}

cypress-timings-machine-2.json:

{
  "durations": [
    {
      "spec": "cypress/e2e/A.spec.ts",
      "duration": 111045
    },
    {
      "spec": "cypress/e2e/B.spec.ts",
      "duration": 33589
    },
    {
      "spec": "cypress/e2e/C.spec.ts",
      "duration": 40123
    }
  ]
}

If we'll look at spec B.spec.ts and merge the durations using the current average code https://github.com/bahmutov/cypress-split/blob/7a2cf41b9d6dba7389fb07e88f199e75ecbc1809/src/timings.js#L102-L121

We'll do (33589 + 35123) / 2 which results in 34356, then we'll merge the last one, which would be (34356 + 33589) / 2 and that gives 33972.5 while the real "average" is 34356. Although doesn't look like much, in tests that take 2+ minutes, this could mean a few seconds difference. This can tip the scale on how the tests are distributed across the machines, not by a lot but still.

If we'll calculate the same for C.spec.ts we'll see that it actually is the average of 40123 and 38835. However A.spec.ts, would not be like C but more like B. It's somewhat inconsistent.

2.2 - Taking the most recent duration of a spec

Again, with the above example, if we instead just take the duration of the most recent run, we avoid the average issue altogether. One could also say that the duration of the most recent is closer to the "real" runtime of the spec than the average duration of all runs.

bahmutov / cypress-split