digitallinguistics / scription

A specification for formatting interlinear glossed texts in a way that is computationally parseable
https://scription.digitallinguistics.io
MIT License
6 stars 0 forks source link

support emphasis with asterisks #39

Closed dwhieb closed 5 years ago

dwhieb commented 5 years ago

Emphasis can be added on any line except the metadata, speaker, and notes lines by adding asterisks around the emphasized item or portion of the data:

*wax*dungu qasi
*waxt*-qungu qasi
*day*-one man
one *day* a man

For the following lines (utterance-level data), asterisks may occur anywhere in the data:

For the following lines (word-level data), pairs of asterisks may only appear at word and morpheme boundaries. Asterisks placed elsewhere should be stripped from the data and ignored.

Asterisks should be stripped from the data itself, and should not be stored in data fields by parsers. However, parsers may choose to utilize information about emphasis in other ways.

If an odd number of asterisks is found, emphasis on that line should be ignored.

dwhieb commented 5 years ago

Some potential concerns with this;

  1. This would be difficult to implement programmatically.

  2. There is currently no way for the DLx JSON format to accommodate and store emphasis data.

  3. The Scription format, as well as the DLx JSON format, are about storing documentary linguistic data. Emphasis is about presentation of that data, and varies from one presentation of that data to the next.

However, it may still be useful for linguists preparing data for specific publications to use the Scription format in a way that supports emphasis. In this case, the emphasis can be thought of as data attached to a specific view of the data.

I think that the ability to use emphasis should therefore be accommodated by the Scription format, and that future versions of the Scription spec should attempt to retain compatibility with using asterisks for emphasis, but continue making no statement about how that data should be processed or presented.