MaikeZuefle commented 1 year ago

Hate Speech Detection

This project aims to go beyond the random train-test split by developing a more challenging data-splitting process to better evaluate generalisation performance. We rely on a models internal representations to create a data split, creating the split by clustering the internal representations and assigning clusters to either the train or the test set. Hate Speech is used as a testing ground for developing the splitting method.

Authors

Maike Züfle m.s.zufle@sms.ed.ac.uk
Verna Dankers v.dankers@sms.ed.ac.uk
Ivan Titov ititov@inf.ed.ac.uk

Checklist:

[x] I and my co-authors agree that, if this PR is merged, the code will be available under the same license as the genbench_cbt repository.
[x] Prior to submitting, I have ran the GenBench CBT test suite using the genbench-cli test-task tool.
[x] I have read the description of what should be in the doc.md of my task, and have added the required arguments.
[x] I have submitted or will submit an accompanying paper to the GenBench workshop.

vernadankers commented 1 year ago

Hello!

We are getting quite close to the deadline (September 1, 11:59PM anywhere on earth), which is why I wanted to remind you of the fact that your PR still needs some attention: see the automated check that failed.

Please don't forget to submit your accompanying paper to Openreview via https://openreview.net/group?id=GenBench.org/2023/Workshop by September 1.

Good luck finalising your PR and paper, feel free to tag us if you have questions. Cheers, Verna On behalf of the GenBench team

MaikeZuefle commented 1 year ago

@vernadankers @dieuwkehupkes I tried to submit our data split. Unfortunately, the test_task check failed.

Error: Task with id 'latent_feature_split' does not exist. Please specify a valid task id. Error: Process completed with exit code 2.

However, our task id is "latent_feature_based_data_split" and not "latent_feature_split".

Could you run the checks using "genbench-cli test-task --id latent_feature_based_data_split", which is the new task id? I do not get an error when I run this.

In comparison to the sample submission in August, I changed the title of our task and also added subtasks. I see that these exist under "Files changed". Is there a way for me to change the id?

kazemnejad commented 1 year ago

@MaikeZuefle We're in the process of merging the tasks into the repo. In order to merge your task, we need the following changes:

Please host the dataset files somewhere else and submit a new PR without the files (even if you remove the files from the current PR, the files are gonna still remain in the git history)
Could you please include a single file usage_example.py of each task where you use each task for finetuning and evaluation of a model the way you intent your tasks must be used. Preferably, it should be done on commonly used pretrained huggingface model. Please also include requirements-usage-example.txt for the python dependencies needed to be installed for running the example.

GenBench / genbench_cbt_2023

[Task Submission] Hate Speech Detection (`latent_feature_split`) #12

Hate Speech Detection

Authors

Checklist: