asyml / forte

Forte is a flexible and powerful ML workflow builder. This is part of the CASL project: http://casl-project.ai/
Apache License 2.0
240 stars 60 forks source link

Global DataPack #625

Open zhanyuanucb opened 2 years ago

zhanyuanucb commented 2 years ago

Is your feature request related to a problem? Please describe. Entries are currently stored on a per-DataPack basis, which means that a DataPack should correspond to only one data point (for example, could be one document). This local info can be useful sometimes, but more often, global info of the whole pipeline or even among multiple pipelines is more useful. For instance, we are more interested in the BLEU score on the whole corpus, instead of the scores for a few documents.
We need a way to collect and store data across the pipeline.

Describe the solution you'd like Introduce Global DataPack and Reducer

Describe alternatives you've considered A clear and concise description of any alternative solutions or features you've considered.

Additional context Add any other context or screenshots about the feature request here.

hunterhector commented 2 years ago

Nice issue! But I feel like this issue is a bit too large. We could consider implementing Global DataPack and Reducer as separate issues (or other separation in your own favor)?

Btw, for the issue title, maybe we can talk about the general global data pack instead of focusing on metrics, by the end of the day, this utility is useful for other cases too.

zhanyuanucb commented 2 years ago

We could consider implementing Global DataPack and Reducer as separate issues (or other separation in your own favor)?

Agree. I'll change this issue to Global DataPack only, and make another one for Reducer. I will change the issue title accordingly.