OpenPecha / Requests

RFWs and RFCs for all OpenPecha repositories
0 stars 0 forks source link

RFW0114: Recreate the benchmark dataset, ensuring a more balanced distribution across all departments #367

Closed tentamdin closed 9 months ago

tentamdin commented 10 months ago

RFW0114: Recreate the benchmark dataset, ensuring a more balanced distribution across all departments

Summary

To recreate a benchmark dataset with a more even distribution within departments, specifically considering genders, ages, and education qualifications.

Key Concepts

Benchmark dataset: The dataset is used as a reference point for performance evaluation.

Context

The current benchmark dataset lacks representation and balances across various demographic groups within departments. This can lead to biased evaluations and inaccurate performance assessments. Rebalancing the dataset with these factors in mind will ensure fairer and more reliable evaluations.

Outputs

A new benchmark dataset with a significantly more even distribution of data points across departments, considering gender, age, and education qualification.

We aim for ~10k samples in the benchmark with equal distribution across all the 5 departments. Each department needs 2k examples and even within departments, we need even distribution among the categories.

Inputs

Existing benchmark dataset. Information on desired distribution percentages, and demographic breakdowns within each department.

Timeline

Specify the expected delivery date for the project.

References

Include any relevant links or resources for additional context or information.

kaldan007 commented 10 months ago

is it for all department

kaldan007 commented 10 months ago

timeline missing

gangagyatso4364 commented 10 months ago

what will be count of benchmark dataset from each department as in case of mt, sebastian suggested 1K from each genre.