Closed hvlaardingerbroek closed 4 years ago
Solved by changing the string replacement to re.sub
and using capture groups to modify parentheses to square brackets.
I think we should consider this a conversion issue rather than a source text issue, since it is not really a mistake in the original text, but only a variation. When we convert to .nena
format, we work to funnel variations into single slots. This is a different situation compared to, e.g., missing line numbers in a file.
One exception that I made to this logic is with adding the missing colon in the comment from Urmi C, Village Life (6)
(45) (*GK* vàrdə?)
It might be argued that this case is indeed a mistake. However, for the sake of simplicity, I decided to treat it as another variation.
In standardizing the comments, I have also removed the unnecessary emphasis on the initials. So the example above now reads:
(45) [GK: vàrdə?]
The NENA texts contain some comments in round or square brackets. Round brackets are also used to indicate line numbers.
The following comments are attested: General remark:
(interruption)
in Urmi_C A3 'Axiqar', line 28 Introducing another speaker:(GK: ... )
32 times in Urmi_C, with some variations: brackets can be round or square (mostly square), and once the colon is missing.Should we decide on one standard way to encode such comments?
I suggest we choose square brackets for comments, with the special notation with colon to introduce a different speaker (typically the interviewer), with the restriction that no spaces can occur between the opening bracket and the colon, which is followed by a space. There is no need for special emphasis markers as the syntax is clear. e.g.:
[interruption]
[GK: ...]