UniversalDependencies / docs

Universal Dependencies online documentation
http://universaldependencies.org/
Apache License 2.0
269 stars 245 forks source link

Multi-line glossing #13

Open manning opened 10 years ago

manning commented 10 years ago

Sampo, is it possible to support with brat doing interlinear glossing as is standard in linguistics for texts in different languages or just if you want to give more information about the morphology, etc. (http://en.wikipedia.org/wiki/Interlinear_gloss). I think that will be very useful for giving examples in different languages.

spyysalo commented 10 years ago

We're working on this, and there's a workaround with \n that's being used for the time being, see e.g. http://universaldependencies.github.io/docs/fi/nsubj.html .

manning commented 10 years ago

Right, I saw that, but that's pretty bad for several reasons (the extra line of unnecessary brat nodes, the words don't align vertically in columns with what they are glossing...). Great to know it's being worked on.

dan-zeman commented 9 years ago

Is the workaround with \n somehow available also for the conllu input? I would use it in a French example that contains a multi-word token, which I think can only be done in CoNLL-U.

spyysalo commented 9 years ago

@dan-zeman : The \n trick isn't implemented for CoNLL-U. I had kind of vaguely thought that it would not be necessary as the CoNLL-U supports multiple sentences per example (standard syntax), but now that I think about it, the validation requires a parse tree for each, so you couldn't quite get the same effect as with \n for SD ... I'll think about this a bit. It would be helpful if you could provide me with your example in standard CoNLL-U to test with.

dan-zeman commented 9 years ago

It did not occur to me that I could provide several sentences in one example. That is probably sufficient, thanks, I will try it. CoNLL-U itself is not replacement for sdparse, exactly because you have to supply the full tree. But I still thought it should be used when the words and tokens do not match. The example is below. What it shows in case.md (just committed) could be shown even without splitting the "aux" token but I thought we should split it so that all our examples match what we say about words and tokens elsewhere.

~~~ conllu
# give the toys to the children
1     donner    donner   VERB   _   VerbForm=Inf               0   root   _   give
2     les       le       DET    _   Definite=Def|Number=Plur   3   det    _   the
3     jouets    jouet    NOUN   _   Gender=Masc|Number=Plur    1   dobj   _   toys
4-5   aux       _        _      _   _                          _   _      _   _
4     au        au       ADP    _   _                          6   case   _   to
5     les       le       DET    _   Definite=Def|Number=Plur   6   det    _   the
6     enfants   enfant   NOUN   _   Gender=Masc|Number=Plur    1   nmod   _   children
~~~
spyysalo commented 9 years ago

OK, glad to hear you can work with the current implementation. I'm considering providing some special syntax to switch off validation for treeness to allow e.g. text-only translations, but would prefer to avoid having the JS CoNLL-U parser deviate from the spec.

dan-zeman commented 9 years ago

So here is the bad news. When I tried to insert a two-sentence CoNLL-U example (see below), it completely disappeared and the visualisations in the rest of the page are broken too (ce3d607f44983e5312c427f72926ecab113a007e which I am now going to revert).

~~~ conllu
# give the toys to the children
1     donner    donner   VERB   _   VerbForm=Inf               0   root   _   give
2     les       le       DET    _   Definite=Def|Number=Plur   3   det    _   the
3     jouets    jouet    NOUN   _   Gender=Masc|Number=Plur    1   dobj   _   toys
4-5   aux       _        _      _   _                          _   _      _   _
4     au        au       ADP    _   _                          6   case   _   to
5     les       le       DET    _   Definite=Def|Number=Plur   6   det    _   the
6     enfants   enfant   NOUN   _   Gender=Masc|Number=Plur    1   nmod   _   children

# now the parallel English tree
1     give       donner   VERB   _   VerbForm=Inf               0   root   _   give
2     the        le       DET    _   Definite=Def|Number=Plur   3   det    _   the
3     toys       jouet    NOUN   _   Gender=Masc|Number=Plur    1   dobj   _   toys
4     to         au       ADP    _   _                          6   case   _   to
5     the        le       DET    _   Definite=Def|Number=Plur   6   det    _   the
6     children   enfant   NOUN   _   Gender=Masc|Number=Plur    1   nmod   _   children
~~~
spyysalo commented 9 years ago

Hey, a bug! Thank you for catching this. I'm guessing something goes wrong with the offsets when adding the extra "sentence" for the token sequence. Could you please open an issue specific to this? I'll check this first thing tomorrow.

spyysalo commented 9 years ago

Also, it's likely that some error will be shown on the JavaScript console when the visualization goes wrong. It would be helpful if you could provide this information with the issue. Thanks!

spyysalo commented 9 years ago

(resolved in #68)

dan-zeman commented 9 years ago

Thanks, Sampo! It works for me now.

trf0412 commented 9 years ago

I think I'm trying to do what's described here, but since I've only just started using Brat I'm not 100% sure. Is there any way you can post an example image (or link) showing what you achieved in Brat?

Thanks, Tim

manning commented 8 years ago

@spyysalo You might be able to answer the above question from June. I presume we should move this to "later" since it's not being worked on for release 1.2....

dan-zeman commented 8 years ago

I am not sure if @trf0412 's question was directed at me but anyway. Consider the above code, which I am repeating here:

~~~ conllu
# give the toys to the children
1     donner    donner   VERB   _   VerbForm=Inf               0   root   _   give
2     les       le       DET    _   Definite=Def|Number=Plur   3   det    _   the
3     jouets    jouet    NOUN   _   Gender=Masc|Number=Plur    1   dobj   _   toys
4-5   aux       _        _      _   _                          _   _      _   _
4     au        au       ADP    _   _                          6   case   _   to
5     les       le       DET    _   Definite=Def|Number=Plur   6   det    _   the
6     enfants   enfant   NOUN   _   Gender=Masc|Number=Plur    1   nmod   _   children
# now the parallel English tree
1     give       donner   VERB   _   VerbForm=Inf               0   root   _   give
2     the        le       DET    _   Definite=Def|Number=Plur   3   det    _   the
3     toys       jouet    NOUN   _   Gender=Masc|Number=Plur    1   dobj   _   toys
4     to         au       ADP    _   _                          6   case   _   to
5     the        le       DET    _   Definite=Def|Number=Plur   6   det    _   the
6     children   enfant   NOUN   _   Gender=Masc|Number=Plur    1   nmod   _   children
~~~

If it is included in the source of a page here in the documentation system, here is what you get:

image

Note that some information is only visible when you place your mouse cursor over a node (the "enfants" frame in the upper right part).