This git repository hosts the transcription data of the project The Shawi-type Arabic dialects (FWF P 33574).
PI: Stephan Procházka (University of Vienna)
National Cooperation Partner: Charly Mörth (Austrian Academy of Sciences)
THIS IS PRELIMINARY DATA AND COPYRIGHTED MATERIAL!
If you want to use any material in this repository please contact PI Stephan Procházka (University of Vienna).
This will change at the end of the project.
Directory | Content | Remarks |
---|---|---|
001_src |
Original sources | Source documents (e.g. raw transcriptions) |
080_scripts_generic |
Conversion Scripts | mostly the ELAN2TEI conversion script (implemented in Python) which generates the initial TEI data prior to tokenization based on the ELAN transcription documents in 122_elan |
082_scripts_xsl |
XSLT scripts | XSLT scripts |
103_tei_w |
TEI-XML with tokens | This is where ELAN2TEI puts its output. Re-running TEI2ELAN will overwrite all content in this directory, so do not do any manual changes here but copy the file to 010_manannot beforehand. |
010_manannot |
manually annotated TEI-XML | Tokenized TEI documents from 103_tei_w which are manually annotated. |
802_tei_odd |
TEI customization (ODD) | This is the source of truth for the SHAWI Schema and the HTML documentation generated from it. |
130_vert_plain |
NoSketch Engine Verticals | NoSketch Engine text verticals |
803_RNG-schematron |
Schemas | derived from the ODD in 802_tei_odd |
804_xsd |
Schemas | derived from the ODD in 802_tei_odd |
850_docs |
Documentation | Further data documentation, esp. the HTML documentation of the ODD |
The oXygen project shawi.xpr
contains the configuration for various transformation scenarios.
The directories css
, html
, js
and xsl
are used by the TEI Enricher.
For more information refer to the SHAWI Data Processing and Curation Document
The following steps happen before data is ingested into this repository:
Workflow steps reflected in the data in this repository:
122_elan
and pushes the changes to git.122_ELAN
and transforms them into tokenized standalone TEI documents, storing them under 103_tei_w
. Additionally, a TEI Corpus file is generated which includes corpus level metadata and controlled vocabularies. 010_manannot
. `010_manannot
.generate-workflow_vars-shawi
andre-run this job
AC2
at the upper left corner of the screen or acdh-ch-cluster-2
vicav-test
in the window in the upper right corner of the screenworkloads
(menu on the left) and on deployments
shawi-app-devel
andredeploy
(three dots on the right)