GenBench / genbench_cbt_2023

The official Genbench Collaborative Benchmarking Task repository 2023 (Archived)
Other
14 stars 18 forks source link

[Task Submission] Hate Speech Detection (`latent_feature_split`) #12

Closed MaikeZuefle closed 11 months ago

MaikeZuefle commented 1 year ago

Hate Speech Detection

This project aims to go beyond the random train-test split by developing a more challenging data-splitting process to better evaluate generalisation performance. We rely on a models internal representations to create a data split, creating the split by clustering the internal representations and assigning clusters to either the train or the test set. Hate Speech is used as a testing ground for developing the splitting method.

Authors

Checklist:

vernadankers commented 1 year ago

Hello!

We are getting quite close to the deadline (September 1, 11:59PM anywhere on earth), which is why I wanted to remind you of the fact that your PR still needs some attention: see the automated check that failed.

Please don't forget to submit your accompanying paper to Openreview via https://openreview.net/group?id=GenBench.org/2023/Workshop by September 1.

Good luck finalising your PR and paper, feel free to tag us if you have questions. Cheers, Verna On behalf of the GenBench team

MaikeZuefle commented 1 year ago

@vernadankers @dieuwkehupkes I tried to submit our data split. Unfortunately, the test_task check failed.

Error: Task with id 'latent_feature_split' does not exist. Please specify a valid task id. Error: Process completed with exit code 2.

However, our task id is "latent_feature_based_data_split" and not "latent_feature_split".

Could you run the checks using "genbench-cli test-task --id latent_feature_based_data_split", which is the new task id? I do not get an error when I run this.

In comparison to the sample submission in August, I changed the title of our task and also added subtasks. I see that these exist under "Files changed". Is there a way for me to change the id?

kazemnejad commented 1 year ago

@MaikeZuefle We're in the process of merging the tasks into the repo. In order to merge your task, we need the following changes:

  1. Please host the dataset files somewhere else and submit a new PR without the files (even if you remove the files from the current PR, the files are gonna still remain in the git history)

  2. Could you please include a single file usage_example.py of each task where you use each task for finetuning and evaluation of a model the way you intent your tasks must be used. Preferably, it should be done on commonly used pretrained huggingface model. Please also include requirements-usage-example.txt for the python dependencies needed to be installed for running the example.