fcbond / ltdb

Linguistic Type Data-Base
Other
2 stars 2 forks source link

Use surface forms in examples #38

Closed fcbond closed 9 months ago

fcbond commented 1 year ago

LTDB currently displays lemmas in examples; does that depend on LTDB code or on something else (ACE output)?..

Would be much more convenient to see the examples with surface forms... Especially when they get longer, having just lemmas in Spanish can be confusing :).

olzama commented 10 months ago

Perhaps I could tackle this; where should I start looking in the source code?..

arademaker commented 10 months ago

Hi @fcbond do you have LTDB running in a public endpoint? So we can see what your comment means?

olzama commented 10 months ago

@arademaker Francis's comment is actually my email to him :).

LTDB currently displays lemmas instead of surface forms in corpus examples; that's rather inconvenient for non-English languages...

olzama commented 10 months ago

I got as far as setting up Pycharm as the debugger:

Screenshot from 2023-11-10 17-15-53

Screenshot from 2023-11-10 17-17-49

But I can't figure out where the relevant code is, for the examples that get displayed. Any help? Where should I try to set the breakpoint?

fcbond commented 10 months ago

The lex-rule is shown by the route

https://github.com/fcbond/ltdb/blob/dbbc9f9406b6e8a864e212d695d97fcacd6a16d8/web/routes.py#L102-L104

Which somewhat confusingly is rendered by the lextype template, which displays the sentences as such: https://github.com/fcbond/ltdb/blob/dbbc9f9406b6e8a864e212d695d97fcacd6a16d8/web/templates/lextype.html#L80-L89

fcbond commented 10 months ago

The problem is that the words in the sents dictionary are the terminals of the trees: https://github.com/fcbond/ltdb/blob/dbbc9f9406b6e8a864e212d695d97fcacd6a16d8/scripts/gold2db.py#L111C13-L120C67

In grammars using an external morphological analyzer like the SRG, these are probably the lemmas. I don't know where the surface form is stored .

Ideally we should be able to link back to cfrom-cto, and then use the original sentence, ...

olzama commented 10 months ago

See if it can be fixed like this maybe? https://github.com/fcbond/ltdb/pull/39

However, that only fixes the issue in the examples (not the trees). Which is maybe ok. I suspect perhaps the trees come from ACE directly? ACE+LUI displays only lemmas, too. And I am not willing to try and fix that for the moment. In any case, having the surface forms in the examples is already much better, and reading the tree is easy with the example displayed right above it.

That being said, if you think surface forms can be passed to the code displaying the trees, let me know! Perhaps also the DMRS?..