GenBench / genbench_cbt_2023

The official Genbench Collaborative Benchmarking Task repository 2023 (Archived)
Other
14 stars 18 forks source link

[Task Submission] Hate Speech Detection (`latent_feature_splits`) #37

Closed MaikeZuefle closed 10 months ago

MaikeZuefle commented 12 months ago

Latent Feature-based Data Splits

This project aims to go beyond the random train-test split by developing a more challenging data-splitting process to better evaluate generalisation performance. We rely on a models internal representations to create a data split, creating the split by clustering the internal representations and assigning clusters to either the train or the test set. Hate Speech is used as a testing ground for developing the splitting method.

Authors

Checklist: