Closed SachaG closed 2 years ago
Here's what I have currently to calculate total wins btw:
// count how many matches each item won
const winsPipeline: any[] = [
// match all relevant results
{
$match: match
},
// flatten each bracket array because we're treating every match the same in
// this aggregation
{
$unwind: {
path: `$${key}`
}
},
// projection
{
$project: {
// we can drop the _id
_id: 0,
// use the field path as id
id: `$${key}`,
// keep the year field as year
year: '$year',
// create a new field called "winner" by taking the last item of each match array
winner: { $arrayElemAt: [`$${key}`, -1] }
}
},
{
$group: {
// group every document that has the same year and winner
_id: {
year: '$year',
winner: '$winner'
},
// sum up occurences of each year/winner combo and store that on new "total" field
total: { $sum: 1 }
}
},
// discard any documents where the winner was null
{
$match: { '_id.winner': { $ne: null } }
},
// project on id/year/total fields to match default aggregation format
{
$project: {
_id: 0,
id: '$_id.winner',
year: '$_id.year',
total: 1
}
},
{ $sort: { [sort]: order } }
]
This has been a fun one! Both pipelines are not optimized and there may be more efficient solutions, but they get the job done.
I tried to annotate the stages with comments to make my intent more clear.
/**
* Assume the following documents in the "results" collection:
* {
* bracketResult: [
* [1, 2, 1],
* [3, 4, 3],
* ...
* ]
* }
*/
/**
* Wins Aggregation
*/
db.results.aggregate([
/**
* Reduce over the bracketResult array and determine the round. (Use $reduce
* instead of $map in order to get a running index.)
*/
{
$project: {
_id: 0,
matches: {
$reduce: {
input: "$bracketResult",
initialValue: { acc: [], index: 0 },
in: {
acc: {
$concatArrays: [
"$$value.acc",
[
{
player: { $slice: ["$$this", 2] },
winner: { $arrayElemAt: ["$$this", 2] },
round: {
$switch: {
branches: [
{ case: { $lt: ["$$value.index", 4] }, then: 1 },
{ case: { $lt: ["$$value.index", 6] }, then: 2 },
],
default: 3,
},
},
},
],
],
},
index: { $add: ["$$value.index", 1] },
},
},
},
},
},
{ $project: { match: "$matches.acc" } },
/**
* Unwrap the individual matches, and also the players, effectively producing
* two documents for a single match (one will be used for each player).
*/
{ $unwind: "$match" },
{ $unwind: "$match.player" },
/**
* Group by player and round, summing up the totals and wins.
*/
{
$project: {
player: "$match.player",
round: "$match.round",
hasWon: {
$cond: {
if: { $eq: ["$match.player", "$match.winner"] },
then: 1,
else: 0,
},
},
},
},
{
$group: {
_id: { player: "$player", round: "$round" },
totalCount: { $sum: 1 },
count: { $sum: "$hasWon" },
},
},
/**
* Create the three properties "round1", "round2", and "round3". Only one of
* them will actually contain data at this stage.
*/
{
$project: {
_id: 0,
player: "$_id.player",
round1: {
$cond: {
if: { $eq: ["$_id.round", 1] },
then: {
totalCount: "$totalCount",
count: "$count",
percentage: { $divide: ["$count", "$totalCount"] },
},
else: {},
},
},
round2: {
$cond: {
if: { $eq: ["$_id.round", 2] },
then: {
totalCount: "$totalCount",
count: "$count",
percentage: { $divide: ["$count", "$totalCount"] },
},
else: {},
},
},
round3: {
$cond: {
if: { $eq: ["$_id.round", 3] },
then: {
totalCount: "$totalCount",
count: "$count",
percentage: { $divide: ["$count", "$totalCount"] },
},
else: {},
},
},
},
},
/**
* Group by player and merge together the round-fields created in the
* previous stage.
*/
{
$group: {
_id: "$player",
round1: { $mergeObjects: "$round1" },
round2: { $mergeObjects: "$round2" },
round3: { $mergeObjects: "$round3" },
},
},
/**
* Sum up the totals and wins of all three rounds.
*/
{
$project: {
_id: 0,
id: "$_id",
combined: {
totalCount: {
$sum: [
"$round1.totalCount",
"$round2.totalCount",
"$round3.totalCount",
],
},
count: {
$sum: ["$round1.count", "$round2.count", "$round3.count"],
},
},
round1: 1,
round2: 1,
round3: 1,
},
},
/**
* Final formatting.
*/
{
$project: {
id: 1,
combined: {
count: "$combined.count",
percentage: { $divide: ["$combined.count", "$combined.totalCount"] },
},
round1: {
count: { $ifNull: ["$round1.count", 0] },
percentage: { $ifNull: ["$round1.percentage", null] },
},
round2: {
count: { $ifNull: ["$round2.count", 0] },
percentage: { $ifNull: ["$round2.percentage", null] },
},
round3: {
count: { $ifNull: ["$round3.count", 0] },
percentage: { $ifNull: ["$round3.percentage", null] },
},
},
},
]);
/**
* Match-ups Aggregation
*/
db.results.aggregate([
/**
* Map over the individual matches and transform the shape.
*/
{
$project: {
_id: 0,
matches: {
$map: {
input: "$bracketResult",
in: {
/**
* We store an array here that we'll unwrap later in order to split
* a single match into two documents (one for each player).
*/
players: [
{
player: { $arrayElemAt: ["$$this", 0] },
opponent: { $arrayElemAt: ["$$this", 1] },
},
{
player: { $arrayElemAt: ["$$this", 1] },
opponent: { $arrayElemAt: ["$$this", 0] },
},
],
winner: { $arrayElemAt: ["$$this", 2] },
},
},
},
},
},
/**
* Unwind the individual matches and players.
*/
{ $unwind: "$match" },
{ $unwind: "$match.players" },
/**
* Group by player-opponent-combination and sum up totals and wins.
*/
{
$project: {
player: "$match.players.player",
opponent: "$match.players.opponent",
hasWon: {
$cond: {
if: { $eq: ["$match.players.player", "$match.winner"] },
then: 1,
else: 0,
},
},
},
},
{
$group: {
_id: { player: "$player", opponent: "$opponent" },
totalCount: { $sum: 1 },
count: { $sum: "$hasWon" },
},
},
/**
* Calcualte the percentage.
*/
{
$project: {
_id: 0,
player: "$_id.player",
opponent: "$_id.opponent",
count: "$count",
percentage: { $divide: ["$count", "$totalCount"] },
},
},
/**
* Group by player and push an object for each opponent.
*/
{
$group: {
_id: "$player",
matchups: {
$push: { id: "$opponent", count: "$count", percentage: "$percentage" },
},
},
},
{
$project: {
_id: 0,
id: "$_id",
matchups: 1,
},
},
]);
Wow, this is awesome! Thanks for the comments, that's very helpful!
Given the relative complexity of the end result do you think doing the computations in JavaScript instead might make sense? I avoided that because I assumed using aggregations would make for more compact and more performant code, but I have to confess I'm a bit worried about not being able to debug/maintain the aggregations myself…
Anyway I'll plug them in for now and see if it works :) Thanks again!
Your welcome! 🙌
In my pipelines I see the most complexity in the firsts $project
stage, where the raw data is mapped to a more manageable format. IMHO the best solution to avoid this would be to store the data in a more structured way in the first place 😅
In general when it comes to "Database vs Code", I believe it highly depends on the performance requirements and the number of datapoints. Going with the database query probably scales better with more data. Also the complexity for transforming has to live somewhere, and if it's in code it might be just as hard to understand what's going on.
Personally I'm in camp database, but moving the logic to code is a valid decision if it better fits the skills of the developers maintaining the project.
Well for example I got tripped up by a small typo in the code ("matches" vs "match"). I was thinking that if this was TypeScript, it would've been easier to catch. But yeah I take your point, it probably scales better than running code on an array with 10k items…
Fair point, haven't thought about things like static typing 💡With that in mind, DX is definitely better when having the logic in code.
Anyway it's working now, there's a couple things that I forgot to ask about in the original question (for example I need to also handle a year
field) but I found some workarounds. Thanks again for your help!
I'm struggling a bit to figure out the correct mongo aggregations to achieve this:
Input
Data for the 8-player tournament bracket-style questions is stored in the following format, where each triplet is of the form
[first player, second player, winner]
:Desired Output
I would like to output two different data views.
1. Wins Aggregation
In this view, each player would have the following stats:
So we would end up with an 8-item array, where each item has these eight datapoints. For example:
(I've already managed to calculate the total number of wins by the way)
2. Match-ups Aggregation
In this view, a given Player A would have the following stats:
So we would end up with eight 7-item arrays. For example: