Closed sb-b closed 6 years ago
Yeah put another character and problem solved.
I was just asking you to change the '-' character because I thought you have the source code and can easily do this job. I couldn't convert the class files to java files, replace the character and then create a jar file again without errors. Thank you anyways.
I don't have the time to to do that. I'm sorry. M.
Hi,
When using the command:
java -jar ParserOracleArcStdWithSwap.jar -t -1 -l 1 -c training.conll > trainingOracle.txt
ParserOracleArcStdWithSwap.jar puts '-' character between words and their postags in the trainingOracle.txt file. However, in the current version of UD treebanks, some treebanks include xpos values that include multiple '-' characters. So, the oracle files look like this:
[][τὰ-DET_l-p---na-, γὰρ-ADV_d--------, πρὸ-ADP_r--------, αὐτῶν-PRON_p-p---ng-, καὶ-CCONJ_c--------, τὰ-DET_l-p---na-,..., ROOT-ROOT]
When these oracle files are being parsed in load_correct_actions and load_correct_actionsDev methods inside c2.h file, the words and their pos-tags cannot be extracted correctly.
Can it be possible to put another character like '#' between words and postags when creating the oracle txt files? I have tried to change the '-' character with '#' character by decompiling the class files inside ParserOracleArcStdWithSwap.jar but couldn't succeed it.
Thank you,
Betul