Closed bjschoenfeld closed 4 years ago
@bjschoenfeld has this been completed? :tada:
Here is the mongo query that gets the results.
[
{
'$match': {
'run.phase': 'PRODUCE',
'status.state': 'SUCCESS'
}
}, {
'$project': {
'f1': '$run.results.scores',
'pipeline_id': '$pipeline.id',
'datasets': '$datasets.id'
}
}, {
'$project': {
'f1': {
'$arrayElemAt': [
'$f1', 0
]
},
'pipeline_id': '$pipeline_id',
'dataset': {
'$arrayElemAt': [
'$datasets', 0
]
}
}
}, {
'$group': {
'_id': {
'pipeline_id': '$pipeline_id',
'dataset_id': '$dataset'
},
'mean': {
'$avg': '$f1.value'
},
'std': {
'$stdDevSamp': '$f1.value'
},
'count': {
'$sum': 1
},
'group': {
'$push': '$f1.value'
}
}
}, {
'$project': {
'_id': 0,
'pipeline_id': '$_id.pipeline_id',
'dataset_id': '$_id.dataset_id',
'f1_macro_mean_over_runs': '$mean',
'f1_macro_std_dev_over_runs': '$std',
'n_runs': '$count',
'group': '$group'
}
}
]
Our pipeline F1 scores may have some variance in them, depending on the random seed used to initialize the pipelines. It may be the case that the F1 scores vary dramatically and are not significantly better or worse than one another.
We could select randomly a few datasets and run all the pipelines we have with 10 or 100 different random seeds and analyze the distributions of scores.