Open zhanyuanucb opened 2 years ago
Nice issue! But I feel like this issue is a bit too large. We could consider implementing Global DataPack and Reducer as separate issues (or other separation in your own favor)?
Btw, for the issue title, maybe we can talk about the general global data pack instead of focusing on metrics, by the end of the day, this utility is useful for other cases too.
We could consider implementing Global DataPack and Reducer as separate issues (or other separation in your own favor)?
Agree. I'll change this issue to Global DataPack only, and make another one for Reducer. I will change the issue title accordingly.
Is your feature request related to a problem? Please describe. Entries are currently stored on a per-DataPack basis, which means that a DataPack should correspond to only one data point (for example, could be one document). This local info can be useful sometimes, but more often, global info of the whole pipeline or even among multiple pipelines is more useful. For instance, we are more interested in the BLEU score on the whole corpus, instead of the scores for a few documents.
We need a way to collect and store data across the pipeline.
Describe the solution you'd like Introduce Global DataPack and Reducer
Describe alternatives you've considered A clear and concise description of any alternative solutions or features you've considered.
Additional context Add any other context or screenshots about the feature request here.