acoli-repo / conll-merge

Tools for manipulating CoNLL TSV and related formats
Apache License 2.0
5 stars 3 forks source link

Sample usage commands needed #3

Open Nikoschenk opened 4 years ago

Nikoschenk commented 4 years ago

I'd like to "merge" Ontonotes (coref) and PropBank (SRL) annotations. Could someone provide me with a detailed instruction to do this?

chiarcos commented 4 years ago

The merging is pretty trivial, but this case requires some additional preprocessing, because neither OntoNotes coref nor PropBank are originally disseminated in CoNLL formats. So, there is an additional step of conversion needed (see bottom). We will put a single script into a separate repository.

merging:

For merging two (or more) TSV files (FILE1.conll, FILE2.conll) via their first column, use $> cmd/merge.sh FILE1.conll FILE2.conll

For merging (exactly) two TSV files over columns other than the first (here, 1 from FILE1 and 2 from FILE2), use

$>java -cp $CLASSPATH org/acoli/conll/merge/CoNLLAlign FILE1.conll FILE2.conll 1 2

Don't forget to set $CLASSPATH to bin/ and jars in lib/. For other parameters, see CoNLLAlign log.

preprocessing:

For converting a OntoNotes coref file to CoNLL, use

$> cmd/ontonotes.coref2conll.sh FILE1.COREF > FILE1.conll

For creating a CoNLL-style PropBank file from PropBank + Penn annotations, see cmd/propbank2conll/readme.txt.