cuhksz-nlp / RE-TaMM

MIT License
15 stars 6 forks source link

data pre-processing #4

Open SUIBIANDA opened 2 years ago

SUIBIANDA commented 2 years ago

I read the readme document in the data folder. There is a sentence "Download the dataset from official website and do the pre-processing in the format of sample_data.". Do we need to write our own code to run the data preprocessing process?

yuanheTian commented 2 years ago

Thanks for being interested in our work.

For ACE05, we follow previous studies (Christopoulou et al., 2018) to process the data. You can find the code to pre-process the data on their github repository (https://github.com/tticoin/LSTM-ER).

For SemEval, you can download the official data and write a very short script to separate the special entity marker (e.g., , <\e1>, , <\e2>) from the entity. You may also want to separate the punctuations from the attached word.

If you want, we may release the pre-processed data. (we cannot release ACE05 because of the copyright issues).

SUIBIANDA commented 2 years ago

Thanks for your reply.

It would be fantastic if you could release the pre-processed semeval data.

yuanheTian commented 2 years ago

The data is posted. Please feel free to check it out.