ftyers / docs

Universal Dependencies online documentation
http://universaldependencies.github.io/docs/
Apache License 2.0
1 stars 0 forks source link

[ud] Release v1.3: pos mapping: mod => ? #21

Open makazhan opened 8 years ago

makazhan commented 8 years ago

Francis states that mod should be mapped to PART in UD:

mod_ass; => PART

Jonathan kind of agrees:

I would say шығар and сияқты are "particles" in UD's terminology. They come before copula and verbal agreement, right? So сен әдемі сияқтысың and сен келетін сияқтысың ?

I've been dealing with them exactly as Jonathan continues to describe:

One other way to analyse these (assuming my examples are right?) would be to say сияқты is a predicate of some sort (either noun or adjective) and that it's the root with a copula relationship to the subject, which in this case would have to be clauses.... or something

* The examples are right.

I agree that they behave like емес or ғана, as in мен жаман емеспін or мен оқушы ғанамын, meaning that "they come before copula and verbal agreement".

But is this the only criterion for particles? I mean -шы attaches after agreement without any copulas similarly Ғой comes after agreement, right?

Besides, what relation are you going to use for particles of this sort?

The only case currently in the treebank uses discourse:

1   Мүмкін    мүмкін    _   adj advl    3   advmod  _   _
2   бұл  бұл  _   prn dem|nom 3   subj    _   _
3-5 Азамат шығар _   _   _   _   _   _   _   _
3   _   Азамат    _   np  ant|m|nom   0   root    _   _
4   _   е  _   cop aor|p3|sg   3   cop _   _
5   _   шығар  _   mod _   3   disc    _   _
6   ?   ?   _   sent    _   3   punct   _   _

I suspect this is because of the question.

How about doubt, as in Білмеймін, баратын шығармын (I dunno maybe I'll go) or resemblance, as in Піскен сияқтыларын таңдап алды (She picked those that looked ripe)?

I am reluctant to use discourse in those cases.

Tokenization: Please, don't tell me that mod being particles implies multi-word tokenization :) I would argue for split tokenization, but then we probably should be consistent, and split stuff like [жаман емеспін].

Dependency: ???

jonorthwash commented 8 years ago

Сияқты is almost definitely an adjective, given its behaviour here. You couldn't say *Піскен ғаналарын таңдап алды, but you could say Піскендерін ғана таңдап алды. (right?), so Gана is different.

For сияқты, this is like stuff (піскен here) depending on an adjective. Maybe parallel in some ways to "green with envy" or "too heavy to lift"? How does UD deal with these?

I think each of the "particles" will have to be dealt with independently, since they're not all the same thing, as you point out. Ones like -шы, Mа, Gой can probably be considered discourse markers, but сияқты, шығар, and ғана are different.

I'll wait for your thoughts on this and we can discuss further then. (I'm also in a hurry right now, so maybe didn't think through all the issues right...)

makazhan commented 8 years ago

Сияқты is almost definitely an adjective, given its behaviour here.

I believe here (meaning Піскен сияқтыларын таңдап алды) it is. No less than ұзын in Шашы ұзындарын таңдап алды. So there's a copula, substantivation, ccomp and so on.

You couldn't say *Піскен ғаналарын таңдап алды, but you could say Піскендерін ғана таңдап алды. (right?), so Gана is different.

The first one sounds ungrammatical, although I've herd similar usages (and some other inflected forms of ғана), but those are very rare and to me sound more like slip of tongue or smth. So, I agree Gана is different.

For сияқты, this is like stuff (піскен here) depending on an adjective. Maybe parallel in some ways to "green with envy" or "too heavy to lift"? How does UD deal with these?

At the first glance to lift in "too heavy to lift" looks like acl but for an adjective. I'll take a look later (prob. tomorrow).

I think each of the "particles" will have to be dealt with independently, since they're not all the same thing, as you point out. Ones like -шы, Mа, Gой can probably be considered discourse markers, but сияқты, шығар, and ғана are different.

I second that.

jonorthwash commented 8 years ago

Btw:

I've been dealing with them exactly as Jonathan continues to describe

Could you give an example of this?

makazhan commented 8 years ago

@jonorthwash

Could you give an example of this?

1   Ол    ол    _   PRON    _   2   nsubj   _   _
2   әдемі  әдемі  ADJ _   _   3   csubj   _   _
3   сияқты    сияқты    ADJ _   _   0   root    _   _
1   Піскен    піс  _   VERB    _   2   csubj   _   _
2   сияқтыларын  сияқты    ADJ _   _   3   ccomp   _   _
3   таңдап    таңда  VERB    _   _   0   root    _   _
4   алды    ал    AUX _   _   3   aux _   _

In general, I'd like to think of stuff like сияқты, тәрізді, шығар etc. as "lexically expressed moods" of copulas.

Does it make sense to anybody?

Like, if copula is a verb, maybe you should be able to use it in imperative, conditional, and other "standard" moods, but in addition you could also express doubt, question... basically any flavor of realis/irrealis.

jonorthwash commented 8 years ago

Sure, that makes sense. But those are probably POS tags, and here we need to worry about the dependency relations atm. How do you propose to express these relations?

makazhan commented 8 years ago

How do you propose to express these relations?

As I've showed earlier:

Successive (nested?) modality/copula as in Айбек оқушы сияқты шығар or Айбек оқушы сияқты болған шығар can be resolved similarly, I hope.

makazhan commented 8 years ago
jonorthwash commented 8 years ago

These two sets of statements appear to be in conflict:

In general, I'd like to think of stuff like сияқты, тәрізді, шығар etc. as "lexically expressed moods" of copulas.

And this:

  • pos(сияқты) = ADJ;
  • 1) [cn]subj(сияқты,*) as in ол әдемі сияқты or піскен сияқтыларын алды
  • [pos = ADP:]
  • 2) case(*,сияқты) as in су сияқты нәрсе or бала сияқты жылады

Could you clarify?