Closed codykingham closed 4 years ago
A questionable example is comments https://github.com/CambridgeSemiticsLab/nena_corpus#comments – which are surrounded in brackets and marked with a speaker: |(GK: text of interjection?)|. Do we want to keep this kind of data in the |.nena| format? @GeoffreyKhan https://github.com/GeoffreyKhan is this kind of thing something you need to be able to do?
GK: These are not necessary for the database.
Some things that should absolutely be kept include language markers https://github.com/CambridgeSemiticsLab/nena_corpus#text-markup. The suggested markup currently is, e.g., |
Hello |. So maybe this should currently be done in the same way while inputting text? In |.docx| these values are normally indicated via superscript letters. @GeoffreyKhan https://github.com/GeoffreyKhan would you be comfortable placing |<>| tags around such letters when you do your copying/pasting? GK: That would be fine.
thanks
Geoffrey
— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/CambridgeSemiticsLab/nena_corpus/issues/7, or unsubscribe https://github.com/notifications/unsubscribe-auth/AMC4DG4323333GDLJQ66MV3RH6VHFANCNFSM4LNUWK4Q.
-- Geoffrey Khan Regius Professor of Hebrew University of Cambridge
Faculty of Asian and Middle Eastern Studies Sidgwick Avenue Cambridge CB3 9DA UK
This has been implemented in https://github.com/CambridgeSemiticsLab/nena_corpus/blob/master/docs/nena_format.md
Also see /standards
.
And it is now incorporated into a new parser.
The
.nena
formatting guidelines are now a bit old and have moved past the draft stage. As @jamespstrachan builds the text input tool, we should think more carefully about what should absolutely go in to the.nena
format and what we should leave out as an unnecessary complication.An example of a feature in the draft documentation that is probably superfluous is:
line breaks – marked with
/
and//
. These were intended to preserve poetic indentations from .docx sources. But those features seem less relevant in relation to the new audio files.A questionable example is comments – which are surrounded in brackets and marked with a speaker:
(GK: text of interjection?)
. Do we want to keep this kind of data in the.nena
format? @GeoffreyKhan is this kind of thing something you need to be able to do?Some things that should absolutely be kept include language markers. The suggested markup currently is, e.g.,
<E>Hello<E>
. So maybe this should currently be done in the same way while inputting text? In.docx
these values are normally indicated via superscript letters. @GeoffreyKhan would you be comfortable placing<>
tags around such letters when you do your copying/pasting?