LR-POR / ud-matrix

1 stars 0 forks source link

+title: Some experiments with UD vs ERG

Matrix

MRS sentences are almost identical to the MATRIX with 7 differences:

+BEGIN_EXAMPLE

% diff matrix-en.sent mrs-en.sent 20c20 < Cats bark.

Cats go. 22c22 < Some bark.

Some went. 53c53 < Chased dogs bark.

Chased dogs go. 55c55 < That the cat chases Browne is old.

That the cat chases Browne is obvious. 62,64c62,64 < Browne's barks. < Twenty three dogs bark. < Two hundred twenty dogs bark.

Browne's goes. Twenty three dogs go. Two hundred twenty dogs go. 79c79 < Abrams promised Browne to bark.

Abrams promised Browne to go. 97c97 < The cats found a way to bark.

The cats found a way to go.

+END_EXAMPLE

** CSLI

http://svn.delph-in.net/erg/trunk/tsdb/gold/csli/

The CSLI dataset https://w3.ual.es/~nperdu/hpsuite.htm (where the sentences are grouped by syntatic constructions). This page is replicated here in csli.txt. Note that some sentences are not grammatically correct (marked with *).

Translations was done using IBM Language Translator Service https://www.ibm.com/watson/services/language-translator/

all processed with [[http://lindat.mff.cuni.cz/services/udpipe/][udpipe]] using the portuguese-bosque-ud-2.5-191206.udpipe and english-ewt-ud-2.5-191206.udpipe models.

: udpipe --tokenize --tag --parse MODEL FILE.sent > FILE.conllu

Some of these sentences were copied to

https://tatoeba.org/eng/sentences_lists/show/166576