Closed ttasovac closed 4 years ago
In TEI proper this would probably have been <usg type='gram'>
— but we don't have this option anymore because we have reduced the scope of <usg>
... so we need to figure out what to do here...
Crazy idea, but I say let's bring back <usg type="gram>
!!! It's too frequently needed to have removed it.It will make TEI-Lex0 conformance arbitrary for all the projects who use it to then have to make their data more vague.
That is just a grammatical abstraction of a collocate... @iljackb Your second sentence is so complex that I am thinking you must have been really tired in the evening :-) "arbitrary" and "more vague" are pretty significant there and serve well as arguments against the idea in the first sentence ;-)
Back to the topic, it would be nice to at some point introduce and then follow a principled distinction between a collocation, which according to the generalised use of <cit>
may qualify as cit/quote (or <form>
, or whatever the trend is, this year), and a collocate, which is (often an abstraction of) what forms a collocation with the headword. Reintroducing <usg type="gram">
for such cases would mean unravelling whatever got built in Lex0. Please let's keep this as the margin that we don't want to reach.
+1 for not bringing usg/@type="gram"
back to life again. usg
should be used for all this socio- and para-linguistic stuff (the real world settings for the speech production), not for collocations and certainly not for syntactic descriptions (the internals of language as a system) such as the one that actually started this thread. Like @bansp, I'd see roughly this division of labour: cit
for linguistic instantiations of (abstract) syntactic constructions and maybe colloc
for the actual collocate (in the case of collocations).
Considering more closely the »+ acc.« part of the initial question: more context would be nice. Still, to me this seems like meaning something along the lines of »$lemma can be used with an adjunct in the accusative case«, so maybe gram/@type="hasAccusativeAdjunct"
.
Hah, I was just thinking about this this morning and was going to suggest a similar approach to Axel's as regards collocations, which occupy the area roughly fenced by <cit>
(they could be examples, equivalents, and I guess also heads of related entries, whatever we do about those now).
I would then use <colloc>
for collocates (faithfully to the original TEI definition, I think), although I wouldn't venture as far as naming grammatical functions, so I would suggest not to say "adjunct" (especially where some linguists would shout "object!", and some others "complement!", etc.). I think what <colloc>
may be missing is a @pattern
attribute which would state the relationship of the collocate to the headword. So, for example (and @charlymo, would you give us your example, please?) I would say something like this:
<colloc pattern="$ _">+acc.</colloc>
(assuming arbitrarily that '$' stands for the headword and '_' for the element content; there already exist @match
and @matchPattern
, so maybe the "pattern" here could be one of those, with appropriate modifications)
To be sure, the content of @pattern
and the "+" may be seen as redundant here, but I am rather sure that we would find examples where the "+" means just "with" rather than "followed by".
In the example cited by Jesse elsewhere (for colloc
inside def
), I would say the first pass digitization could indeed encompass all the relevant string, but then, upon refinement, I think I would like to see a sequence of <colloc>
s there.
Either
I have spoken to several people about this after presenting on this at the collocation workshop at eLex, so the general consensus is that when we have something like: a não ser que [+conj.], we do:
<gramGrp>
<gram type="colloc">
[+conj.]
</gram>
</gramGrp>
This is consistent with our simplification of gramGrp to use only typed gram elements, and not give special treatment to pos or colloc or anything like that...
Reading the above makes me think that +conj is a grammatical property of the phrase a não ser que rather than the property of what (usually or always) follows it. Is my reading the intended one?
Edit: back from a short walk along the memory lane... now I recall what the reasoning was for <gram>
. And the answer to my question above (which I don't delete, because it's there in your mailboxes already) is "no, because type='colloc' switches the focus to what's around". Cheers.
exactly, @bansp! we went back and forth on this one, but @type="colloc" should be enough to indicate the switch of perspective here.
changing status to "documentation" as a reminder to @ttasovac to add examples to the guidelines
We have a section in the Guidelines about this now (Section 2.3.3 Collocates in Chapter 2: Entries.)
I will try to add more examples on that imaginary day when I suddenly realize I have lots of free time...
issue raised by Charly's talk during #lexMC