CopticScriptorium / converters

Converters to encode text into different Coptic text formats (UTF-8 character encoder, SGML processor, etc.)
http://www.copticscriptorium.org
4 stars 4 forks source link

This repository holds character encoding converters to covert ASCII characters in legacy fonts into Unicode Coptic characters and a processing perl script to convert a diplomatic, Coptic text transcription into SGML.


Simple Perl Converter: recode_coptic.pl Version 1.1.0

This assumes a UTF-8 file in one word per line format and automatically converts Coptic encodings

Usage:

recode_coptic.pl file

Read a file in Coptic font encoding and output standard Unicode as UTF-8

recode_coptic.pl in_Coptic.txt > out_utf8.txt

See help (-h) for options

Specify the input format (-f): Default is Coptic font, other options are "CopticLS", "CMCL", "avva_shenouda", "low" (converts all upper case utf8 Coptic to all lower case). E.g., to convert a text in the CopticLS font:

recode_coptic.pl -f CopticLS in_Coptic.txt > out_utf8.txt

Copyright 2013-15 Amir Zeldes, Caroline T. Schroeder. The perl program is free software. You may copy or redistribute the script under the same terms as Perl itself.


About CopticVDWtoUTFConverter

The contents of the CopticVDWtoUTFConverter zip file will convert a file keyed in the ASCII Coptic font created by Dirk Van Damme and Gregor Wurst into Unicode UTF-8 encoding for Coptic (and for use with such fonts as Antinoou).

Licensing

This software was created in 2012-13 by Eric E. Johnson in consultation with Stephen Emmel for the Coptic SCRIPtorIuM project directed by Caroline T. Schroeder and Amir Zeldes.

Copyright 2013 by Eric E. Johnson

This program is free software: you can redistribute it and/or modify it under the terms of the GNU General Public License as published by the Free Software Foundation, either version 3 of the License, or (at your option) any later version.

This program is distributed in the hope that it will be useful, but WITHOUT ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License for more details.

See the GNU General Public License at http://www.gnu.org/licenses/.

How To Use

These instructions assume some familiarity with the terminal and command line operations.

Before using, be sure your input file (ASCII Coptic file) is in LibreOffice format: .odt. LibreOffice is a free download at http://www.libreoffice.org/download/

Unzip the files in this download.

Open the Terminal or command line interface, change to the the directory containing the unzipped files. (Usually in the Terminal key in cd followed by the path, e.g.: >cd /Users/yourname/CopticVDWtoUTFConverter ) For ease of use, put your input file into the directory with the converter.

To run the converter from the command line:

python converter.py name_of_input_odf_file.odt name_of_output_markdown_file.md


SGML processor process_dipl.pl

version 1.0.0

converts SCRIPTORIUM transcription format to CWB/TreeTagger SGML

Copyright 2013-15 Amir Zeldes. The perl program is free software. You may copy or redistribute the script under the same terms as Perl itself.


copticizer.rb

version 1.0.0

converts ASCII characterset for the texts on the St Shenouda Society CD to the Coptic Unicode characterset

Contributed by So Miyagawa, 12 June 2015


SGML Excel Add-in for Mac (sgml_io_Mac_0.9.3.xla)

An add-in for Microsoft Excel that converts SGML annotated text into multiple columns of data (SGML tag names = column names). For use with the NLP pipeline output. (See https://corpling.uis.georgetown.edu/coptic-nlp/ and code at https://github.com/CopticScriptorium/coptic-nlp) Install add-inn. Copy SGML-encoded text, paste into cell A1 of an Excel spreadsheet, then run the add-in.