Devographics / state-of-js-graphql-results-api

State of JS GraphQL API
64 stars 43 forks source link

Mongo Aggregations help for bracket questions #190

Closed SachaG closed 2 years ago

SachaG commented 2 years ago

I'm struggling a bit to figure out the correct mongo aggregations to achieve this:

Input

Data for the 8-player tournament bracket-style questions is stored in the following format, where each triplet is of the form [first player, second player, winner]:

const bracketResult = [
  [7, 6, 6], // first round
  [4, 0, 0], // first round
  [5, 2, 2], // first round
  [3, 1, 3], // first round
  [6, 0, 6], // semi-finals
  [2, 3, 3], // semi-finals
  [6, 3, 6], // finals
];

Desired Output

I would like to output two different data views.

1. Wins Aggregation

In this view, each player would have the following stats:

So we would end up with an 8-item array, where each item has these eight datapoints. For example:

{
 id: 'player_A',
 overall: {
   count: 123,
   percentage: 23
  },
 round1: {
   count: 123,
   percentage: 23
  },
 round2: {
   count: 123,
   percentage: 23
  },
 round3: {
   count: 123,
   percentage: 23
  }
}

(I've already managed to calculate the total number of wins by the way)

2. Match-ups Aggregation

In this view, a given Player A would have the following stats:

So we would end up with eight 7-item arrays. For example:

{
  id: 'player_A',
  matchups: [
    {
      id: 'player_B',
      wins: 123,
      percentage: 23
   },

    {
      id: 'player_C',
      wins: 123,
      percentage: 23
   }

   // etc.
  ]
}
SachaG commented 2 years ago

Here's what I have currently to calculate total wins btw:

// count how many matches each item won
    const winsPipeline: any[] = [
        // match all relevant results
        {
            $match: match
        },
        // flatten each bracket array because we're treating every match the same in
        // this aggregation
        {
            $unwind: {
                path: `$${key}`
            }
        },
        // projection
        {
            $project: {
                // we can drop the _id
                _id: 0,
                // use the field path as id
                id: `$${key}`,
                // keep the year field as year
                year: '$year',
                // create a new field called "winner" by taking the last item of each match array
                winner: { $arrayElemAt: [`$${key}`, -1] }
            }
        },
        {
            $group: {
                // group every document that has the same year and winner
                _id: {
                    year: '$year',
                    winner: '$winner'
                },
                // sum up occurences of each year/winner combo and store that on new "total" field
                total: { $sum: 1 }
            }
        },
        // discard any documents where the winner was null
        {
            $match: { '_id.winner': { $ne: null } }
        },
        // project on id/year/total fields to match default aggregation format
        {
            $project: {
                _id: 0,
                id: '$_id.winner',
                year: '$_id.year',
                total: 1
            }
        },
        { $sort: { [sort]: order } }
    ]
thomasheyenbrock commented 2 years ago

This has been a fun one! Both pipelines are not optimized and there may be more efficient solutions, but they get the job done.

I tried to annotate the stages with comments to make my intent more clear.

/**
 * Assume the following documents in the "results" collection:
 * {
 *   bracketResult: [
 *     [1, 2, 1],
 *     [3, 4, 3],
 *     ...
 *   ]
 * }
 */

/**
 * Wins Aggregation
 */
db.results.aggregate([
  /**
   * Reduce over the bracketResult array and determine the round. (Use $reduce
   * instead of $map in order to get a running index.)
   */
  {
    $project: {
      _id: 0,
      matches: {
        $reduce: {
          input: "$bracketResult",
          initialValue: { acc: [], index: 0 },
          in: {
            acc: {
              $concatArrays: [
                "$$value.acc",
                [
                  {
                    player: { $slice: ["$$this", 2] },
                    winner: { $arrayElemAt: ["$$this", 2] },
                    round: {
                      $switch: {
                        branches: [
                          { case: { $lt: ["$$value.index", 4] }, then: 1 },
                          { case: { $lt: ["$$value.index", 6] }, then: 2 },
                        ],
                        default: 3,
                      },
                    },
                  },
                ],
              ],
            },
            index: { $add: ["$$value.index", 1] },
          },
        },
      },
    },
  },
  { $project: { match: "$matches.acc" } },
  /**
   * Unwrap the individual matches, and also the players, effectively producing
   * two documents for a single match (one will be used for each player).
   */
  { $unwind: "$match" },
  { $unwind: "$match.player" },
  /**
   * Group by player and round, summing up the totals and wins.
   */
  {
    $project: {
      player: "$match.player",
      round: "$match.round",
      hasWon: {
        $cond: {
          if: { $eq: ["$match.player", "$match.winner"] },
          then: 1,
          else: 0,
        },
      },
    },
  },
  {
    $group: {
      _id: { player: "$player", round: "$round" },
      totalCount: { $sum: 1 },
      count: { $sum: "$hasWon" },
    },
  },
  /**
   * Create the three properties "round1", "round2", and "round3". Only one of
   * them will actually contain data at this stage.
   */
  {
    $project: {
      _id: 0,
      player: "$_id.player",
      round1: {
        $cond: {
          if: { $eq: ["$_id.round", 1] },
          then: {
            totalCount: "$totalCount",
            count: "$count",
            percentage: { $divide: ["$count", "$totalCount"] },
          },
          else: {},
        },
      },
      round2: {
        $cond: {
          if: { $eq: ["$_id.round", 2] },
          then: {
            totalCount: "$totalCount",
            count: "$count",
            percentage: { $divide: ["$count", "$totalCount"] },
          },
          else: {},
        },
      },
      round3: {
        $cond: {
          if: { $eq: ["$_id.round", 3] },
          then: {
            totalCount: "$totalCount",
            count: "$count",
            percentage: { $divide: ["$count", "$totalCount"] },
          },
          else: {},
        },
      },
    },
  },
  /**
   * Group by player and merge together the round-fields created in the
   * previous stage.
   */
  {
    $group: {
      _id: "$player",
      round1: { $mergeObjects: "$round1" },
      round2: { $mergeObjects: "$round2" },
      round3: { $mergeObjects: "$round3" },
    },
  },
  /**
   * Sum up the totals and wins of all three rounds.
   */
  {
    $project: {
      _id: 0,
      id: "$_id",
      combined: {
        totalCount: {
          $sum: [
            "$round1.totalCount",
            "$round2.totalCount",
            "$round3.totalCount",
          ],
        },
        count: {
          $sum: ["$round1.count", "$round2.count", "$round3.count"],
        },
      },
      round1: 1,
      round2: 1,
      round3: 1,
    },
  },
  /**
   * Final formatting.
   */
  {
    $project: {
      id: 1,
      combined: {
        count: "$combined.count",
        percentage: { $divide: ["$combined.count", "$combined.totalCount"] },
      },
      round1: {
        count: { $ifNull: ["$round1.count", 0] },
        percentage: { $ifNull: ["$round1.percentage", null] },
      },
      round2: {
        count: { $ifNull: ["$round2.count", 0] },
        percentage: { $ifNull: ["$round2.percentage", null] },
      },
      round3: {
        count: { $ifNull: ["$round3.count", 0] },
        percentage: { $ifNull: ["$round3.percentage", null] },
      },
    },
  },
]);

/**
 * Match-ups Aggregation
 */
db.results.aggregate([
  /**
   * Map over the individual matches and transform the shape.
   */
  {
    $project: {
      _id: 0,
      matches: {
        $map: {
          input: "$bracketResult",
          in: {
            /**
             * We store an array here that we'll unwrap later in order to split
             * a single match into two documents (one for each player).
             */
            players: [
              {
                player: { $arrayElemAt: ["$$this", 0] },
                opponent: { $arrayElemAt: ["$$this", 1] },
              },
              {
                player: { $arrayElemAt: ["$$this", 1] },
                opponent: { $arrayElemAt: ["$$this", 0] },
              },
            ],
            winner: { $arrayElemAt: ["$$this", 2] },
          },
        },
      },
    },
  },
  /**
   * Unwind the individual matches and players.
   */
  { $unwind: "$match" },
  { $unwind: "$match.players" },
  /**
   * Group by player-opponent-combination and sum up totals and wins.
   */
  {
    $project: {
      player: "$match.players.player",
      opponent: "$match.players.opponent",
      hasWon: {
        $cond: {
          if: { $eq: ["$match.players.player", "$match.winner"] },
          then: 1,
          else: 0,
        },
      },
    },
  },
  {
    $group: {
      _id: { player: "$player", opponent: "$opponent" },
      totalCount: { $sum: 1 },
      count: { $sum: "$hasWon" },
    },
  },
  /**
   * Calcualte the percentage.
   */
  {
    $project: {
      _id: 0,
      player: "$_id.player",
      opponent: "$_id.opponent",
      count: "$count",
      percentage: { $divide: ["$count", "$totalCount"] },
    },
  },
  /**
   * Group by player and push an object for each opponent.
   */
  {
    $group: {
      _id: "$player",
      matchups: {
        $push: { id: "$opponent", count: "$count", percentage: "$percentage" },
      },
    },
  },
  {
    $project: {
      _id: 0,
      id: "$_id",
      matchups: 1,
    },
  },
]);
SachaG commented 2 years ago

Wow, this is awesome! Thanks for the comments, that's very helpful!

Given the relative complexity of the end result do you think doing the computations in JavaScript instead might make sense? I avoided that because I assumed using aggregations would make for more compact and more performant code, but I have to confess I'm a bit worried about not being able to debug/maintain the aggregations myself…

Anyway I'll plug them in for now and see if it works :) Thanks again!

thomasheyenbrock commented 2 years ago

Your welcome! 🙌

In my pipelines I see the most complexity in the firsts $project stage, where the raw data is mapped to a more manageable format. IMHO the best solution to avoid this would be to store the data in a more structured way in the first place 😅

In general when it comes to "Database vs Code", I believe it highly depends on the performance requirements and the number of datapoints. Going with the database query probably scales better with more data. Also the complexity for transforming has to live somewhere, and if it's in code it might be just as hard to understand what's going on.

Personally I'm in camp database, but moving the logic to code is a valid decision if it better fits the skills of the developers maintaining the project.

SachaG commented 2 years ago

Well for example I got tripped up by a small typo in the code ("matches" vs "match"). I was thinking that if this was TypeScript, it would've been easier to catch. But yeah I take your point, it probably scales better than running code on an array with 10k items…

thomasheyenbrock commented 2 years ago

Fair point, haven't thought about things like static typing 💡With that in mind, DX is definitely better when having the logic in code.

SachaG commented 2 years ago

Anyway it's working now, there's a couple things that I forgot to ask about in the original question (for example I need to also handle a year field) but I found some workarounds. Thanks again for your help!