UniversalDependencies / UD_English-EWT

English data
Creative Commons Attribution Share Alike 4.0 International
199 stars 42 forks source link

Mentioned=Yes for metalinguistic mentions #446

Open nschneid opened 11 months ago

nschneid commented 11 months ago

The guidelines have a policy that metalinguistic mentions of words should be tagged the same as if they were uses.

I propose an experimental MISC feature Mentioned=Yes to make cases of mentioned language explicit. Some examples in EWT where this would clearly apply (found by searching for "the word"):

The word renaissance (Rinascimento in Italian)

when the FBI uses the word "domestic" the word includes a US-based, highly-educated supporter of the militant islamists

any trade mark registration that incorporates the word ONLINE as a suffix

In Section 3.1, in the last sentence after the proviso insert "a" before the word "change" and after the word "in".

Note that this would be distinct from quotations reflecting a real or hypothetical speech act. Mentioned=Yes is for linguistic expressions referred to as entities, typically treated syntactically like nominals (even if the UPOS is something else).

rueter commented 11 months ago

Yes, @nschneid , I think this would be a great idea. It might not be as significant in well-punctuated texts where mentions are set off with quotation marks and there is an escort "the word" present, but it would definitely be useful, even with the examples given in the guidelines (repeated here)

"Yes": Yes, I think so.

I am waiting for his ‘yes’ on the matter.

"precede":

Such discussion must precede every decision.

He pronounced ‘precede’ in a funny way.

So essentially, we might benefit from situations and language where there is no other marking than "Mentioned=Yes".

amir-zeldes commented 11 months ago

Sure, why not? The only issue I see is implementing it, since if the feature appears in some places, people might get the idea that the entire dataset is annotated for it exhaustively.

BTW there are tons of these and also borderline cases in the dictionary genre in UD_English-GENTLE (three of the documents are literally dictionary entries, incl. things like etymology and cognates, but also example usage, which may or may not be considered metalinguistic)