UniversalDependencies / UD_English-EWT

English data
Creative Commons Attribution Share Alike 4.0 International
201 stars 43 forks source link

Sentence-final CCONJ #349

Open nschneid opened 2 years ago

nschneid commented 2 years ago

The sentence segmentation sometimes leaves a dangling CCONJ at the end, e.g., if each "sentence" is an item in a bulleted list.

In the corpus:

Should these be forward-pointing cc? conj as if the CCONJ stands for the omitted material in the next sentence? parataxis?

Or should the sentences be resegmented to avoid this?

nschneid commented 2 years ago

@amir-zeldes opinion?

amir-zeldes commented 2 years ago

There are two decisions here IMO: first we need to decide what sentence segmentation guidelines are, and then parse. If they are kept as individual sentences, then I think the deprel should probably be conj (promotion to replace the missing coordinate head).

Regarding the question about bullets being sentences, see here

In GUM they are sentences in practice (but one could have decided differently of course)