BrownCLPS / LingView

A web interface for viewing ELAN and FLEx files:
https://brownclps.github.io/LingView
MIT License
19 stars 19 forks source link

Automatic text formatting from website to other documents #81

Open hollyyuqizheng opened 3 years ago

hollyyuqizheng commented 3 years ago

It would be cool if there is a feature on each story's page that allows the user to take a certain line of text and have it automatically formatted for other documents.

For example, it'd be nice to have a button at each time stamp that exports the current time stamp's line as formatted in LaTeX (eg. 4-line glossing plus the citation for this example) -- this can save people's time when they try to copy-paste examples from LingView to include in a LaTeX paper!

A related use is to export a time stamp's line to the error log that Scott created. This error log keeps track of instances in the texts where something is glossed or translated wrongly so that someone can update the errors in the FLEx files once in a while.

hollyyuqizheng commented 3 years ago

Some initial thoughts:

hollyyuqizheng commented 3 years ago

Working in the text-formatter branch.

Some key steps:

hollyyuqizheng commented 3 years ago

Thoughts on making the latex formatter more scalable: I think that most of the changes that need to happen from the current version are in the parts where the formatter “decides” which tiers to grab data from. For the functions that do the actual converting (eg. adding the LaTeX commands such as “\glt” and “\textsc{}”) can remain unchanged, as these functions assume that the appropriate information is passed into them

For the LaTeX package, the 4 pieces of information we need are: the full sentence, each word as divided and marked into its morphemes, the morpheme translation into a certain language, and the translation of the entire sentence. I can think of two main solutions:

(a) Ask for user input whenever the format button is clicked: The button will first trigger a window asking the user to select which tier corresponds to which of these 4 sections that are needed for LaTeX. After the selection, the text conversion happens and the final result is displayed in the window.

(b) During preprocessing and building the site, we can require some input annotations explaining which tier corresponds to which section from the LaTeX format. Then, when a sentence is passed into the format button component, this sentence object can hopefully contain information that describes which tiers should be used for which section of the LaTeX code. We could ask for a separate file that is needed to build the site, called “latex-map.txt” or something, and this file is where the site creator describes which tier matches which section of a LaTeX formatting.

sciepsilon commented 3 years ago

This issue was partially addressed in PR #84, which formats a LingView sentence into a gb4e or gb4e-modified LaTeX gloss. There may be other text formats worth adding in the future.

hollyyuqizheng commented 3 years ago

Some initial feedback from Scott for the initial version:

sciepsilon commented 3 years ago

Also, the current tier selection UI is annoying for FLEx or ELAN files with long tier names. (It looks great on Kuke Chiste, but bad on Singo A'i.) We could improve it by using a grid, like this. The labels along the left side should be the 4 output LaTeX tiers, because users expect to select one button per row (not per column).

                                  palabra en                morfema en a'ingae
                                  a'ingae (Borman)          (Borman)
original sentence                      o                             o
morphemes                              o                             o
hollyyuqizheng commented 3 years ago

More updates from #89 and #90 , including changing the tier selection button panel into a grid-view