Closed steambread666 closed 4 years ago
Thanks for your interest. The script is actually in Java, it converts the results from preprocessed OntoNotes documents (by the official conll2012 script) into conllx format with optionally Stanford dependencies and universal dependencies.
I would like to make some changes to that script and send it to you after the EMNLP conference at the moment. Or I can simply pass it to you without any modifications if you need it urgently.
Thanks a lot! I don't need it urgently and I am looking forward to your modified script!
Hi @steambread666 , I upload the script here: https://drive.google.com/file/d/1ljABpMo61CXaUHh7O1hWvh8BVlRtvLy9/view?usp=sharing
But the script was in Java, it may take you some effort to run the OntoNotesProcess.java
. Feel free to bug me up if you run into any issues.
Some of my suggestions:
processNameFile
function in the OntoNotesProcess.java
file.Thank you very much! I'm using the OntoNotes dataset and I have the license. It would be great if you can share with me the CoNLL format data. Thanks in advance!!
Can you send me an email @steambread666 with a screenshot of your license? Then I will share the datasets with you.
@steambread666 haven't received your email yet, my email address: allanmcgrady@gmail.com
Sorry for not responding in time! I have managed to process the data with the help of your shared script. Thanks a lot!
The paper says "We convert the constituency trees into the Stanford dependency trees using the rulebased tool by Stanford CoreNLP." Could you please share the code that you preprocess the data to adapt to the input format of Stanford CoreNLP tools? Thanks a lot in advance!