ben-aaron188 / textwash

GNU General Public License v3.0
24 stars 4 forks source link

Test/Training Dataset Availability #18

Open SUL-bolle opened 4 months ago

SUL-bolle commented 4 months ago

Thanks for this open source project! I'm currently writing my thesis on text anonymization models and their performance. I would like to assess different models on different datasets and I'd also like to use the dataset you used (especially the annotated enron and wikipedia datasets), because most other datasets are legal or medical texts only. I read in the paper that every material that you used is available in this repository, through I'm having trouble finding this data. Could you help me here?

Thank you!

ben-aaron188 commented 4 months ago

Hi - We are at this point not sharing the annotated data but may do so in the future (but this may be beyond your thesis timeline).

On Tue, 18 Jun 2024 at 11:32, SUL-bolle @.***> wrote:

Thanks for this open source project! I'm currently writing my thesis on text anonymization models and their performance. I would like to assess different models on different datasets and I'd also like to use the dataset you used (especially the annotated enron and wikipedia datasets), because most other datasets are legal or medical texts only. I read in the paper that every material that you used is available in this repository, through I'm having trouble finding this data. Could you help me here?

Thank you!

— Reply to this email directly, view it on GitHub https://github.com/ben-aaron188/textwash/issues/18, or unsubscribe https://github.com/notifications/unsubscribe-auth/AC2FL4XZ7USVQVPXC6HXI4TZH75CRAVCNFSM6AAAAABJPUO7F2VHI2DSMVQWIX3LMV43ASLTON2WKOZSGM2TSNBQGA3DSNY . You are receiving this because you are subscribed to this thread.Message ID: @.***>