NorskRegnesentral / text-anonymization-benchmark

Annotated corpus + evaluation metrics for text anonymisation
MIT License
49 stars 7 forks source link

Create a HuggingFace dataset for TAB #4

Open omri374 opened 2 years ago

omri374 commented 2 years ago

Hi,

The TAB dataset and evaluation approach is amazing! It would be very useful for those interested to train models on this dataset, to have it as a HuggingFace dataset.

Would this be something you'd consider?

plison commented 2 years ago

Definitely, good suggestion! I'll add to to our todo list :-)

mattmdjaga commented 5 months ago

I actually did this a few weeks as I'm gonna use the dataset for some PhD work https://huggingface.co/datasets/mattmdjaga/text-anonymization-benchmark-train https://huggingface.co/datasets/mattmdjaga/text-anonymization-benchmark-val-test