ShuhengL / acl2023_conllpp

5 stars 0 forks source link

License? #2

Open AngledLuffa opened 1 year ago

AngledLuffa commented 1 year ago

Hi,

I was also wondering, do you have a license for the newly annotated data? If you like, I can add a link to this repo on https://github.com/juand-r/entity-recognition-datasets/, but so far, all English datasets on that list have a license of some sort, and I'd like to keep that trend going.

Thanks!

ShuhengL commented 1 year ago

Hi

Sorry for the late reply! Currently I do not have a license for this data, though I should probably look into it... I would love to have it added to the repo that you mentioned! I'll let you know once I figured out the license issue!

Thank you!

AngledLuffa commented 1 year ago

First of all, I wanted to think you for making this dataset available. We built a similar dataset in which we proposed that regional variations in English (and regional PER, LOC, etc names) make models trained on CoNLL or OntoNotes less accurate for newswire from sources other than US or Europe, and one of the reviewers called attention to the idea that temporal drift between CoNLL and the time of our dataset creation might account for the score differences we observed. Using your dataset as additional training data, we argued that regional differences were more relevant than updated temporal drift. Happily, the reviewer who called this out liked our answer, and our paper was accepted! (EMNLP Findings) Naturally, we will be citing your work in the final version of our paper.

Anyway, I am still interested in posting an updated version of the license for this project on the juand-r repo, if you have it

AngledLuffa commented 1 week ago

Ping about this? I think it would improve its discoverability to have it listed on juan-r, but generally speaking those datasets all have licenses of one form or another.