allanj / ner_with_dependency

GNU General Public License v3.0
72 stars 11 forks source link

Convert the constituency trees into the Stanford dependency trees #2

Closed steambread666 closed 4 years ago

steambread666 commented 4 years ago

The paper says "We convert the constituency trees into the Stanford dependency trees using the rulebased tool by Stanford CoreNLP." Could you please share the code that you preprocess the data to adapt to the input format of Stanford CoreNLP tools? Thanks a lot in advance!

allanj commented 4 years ago

Thanks for your interest. The script is actually in Java, it converts the results from preprocessed OntoNotes documents (by the official conll2012 script) into conllx format with optionally Stanford dependencies and universal dependencies.

I would like to make some changes to that script and send it to you after the EMNLP conference at the moment. Or I can simply pass it to you without any modifications if you need it urgently.

steambread666 commented 4 years ago

Thanks a lot! I don't need it urgently and I am looking forward to your modified script!

allanj commented 4 years ago

Hi @steambread666 , I upload the script here: https://drive.google.com/file/d/1ljABpMo61CXaUHh7O1hWvh8BVlRtvLy9/view?usp=sharing

But the script was in Java, it may take you some effort to run the OntoNotesProcess.java. Feel free to bug me up if you run into any issues.

Some of my suggestions:

  1. If you are using the OntoNotes dataset, I can share with you the CoNLL format if you have the license.
  2. If you are using other datasets, you can simply look at the processNameFile function in the OntoNotesProcess.java file.
steambread666 commented 4 years ago

Thank you very much! I'm using the OntoNotes dataset and I have the license. It would be great if you can share with me the CoNLL format data. Thanks in advance!!

allanj commented 4 years ago

Can you send me an email @steambread666 with a screenshot of your license? Then I will share the datasets with you.

allanj commented 4 years ago

@steambread666 haven't received your email yet, my email address: allanmcgrady@gmail.com

steambread666 commented 4 years ago

Sorry for not responding in time! I have managed to process the data with the help of your shared script. Thanks a lot!