This repository contains the MACULA linguistic datasets for the Greek New Testament, including data from:
We are adding further datasets, one at a time.
This data has been combined into a single set of trees. There are three variants of this data, found in the following directories:
tei
contains the New Testament text itself in a format that can easily be formatted for readability.nodes
contains this data in a set of nested Node
elements suitable for many NLP systems and other systems that use recursive algorithms.lowfat
contains the same data in a form more suitable for some kinds of query systems and some kinds of display.TSV
contains the word-level data in a TSV table, without syntactic tree structure. This is simpler for many programs that do not need the complexity of graph structures.Copyright statements for the individual sources can be found in the MACULA Greek license.