This repo contains a structurally clean version of the data of the General Missives, volumes 1-14.
Read more in about.
Cleaning a textual dataset is a lot of work. If such a dataset is a standard work, it will be studied by many students/researchers from several disciplines. To make life easier for those people, they should be able to start with a dataset that is readily processable by any tool of their choice.
Text-Fabric provides a data model that captures the data at the end of the cleaning process just before it goes into other tools. It also support the integration of subsequent annotations with the original data.
The Missives corpus is an example how that works.
For a first impression, start with missieven-search This is a static website that sends the whole corpus to your browser. After a few seconds you can start searching.
You can do full text search via regular expressions, not only in the text, but also in some of its attributes. For example, you can search for a word in original letter texts or in editorial remarks.
More info in the manual.
An example search is in example.json. Download the file, then import it in your search interface, and you see it happening.
You can save search results to excel files.
You get more power when you download Text-Fabric. Text-Fabric operates in the ecosystem of Python and its libraries.
But you do not have to program in order to browse and search the corpus. After installing Python and
pip3 install text-fabric
on the command line, say
tf clariah/wp6-missieven
and a web server on your computer is started which serves you a search-and-browse interface on the Generale Missives corpus. You can search more precisely here than in the search interface-to-go above.
You can save search results to excel files.
Text-Fabric is particularly suited to Jupyter notebooks. There is a handy way to install Python, JupyterLab in one go and Text-Fabric from there.
The next step is to consult the tutorial. This is a series of notebooks that guides you to the computing facilities of Text-Fabric. Text-Fabric is just a library that you import in your own Python programs, which means that you can invoke the whole of Python and its libraries to do your job. The only thing Text-Fabric does is to offer you a handy computing interface to the textual data and their annotations.
See other corpora for experiences with Text-Fabric as a pre-processing tool in other corpora.
The data of the corpus is in the wp6-missieven
repo on GitHub:
xml
directory in this repo)tf
directory in this repo)If you use any method of working with the corpus indicated above, you do not have to
do anything special to download the data.
If you tell Text-Fabric it is in clariah/wp6-missieven
,
it can find it and download it when needed. Automatically.
This repo is by
This repo has been archived in two independent places:
Click the respective badges above to be taken to the archives. There you find ways to cite this work.
You can rerun the conversion programs on the source data and regenerate the simple XML and Text-Fabric versions of the data. See the reproduce. guide.
Another version of the data (less cleaned) is visible online in a BlackLab interface
A latent wish is to make the data of this repository available in a BlackLab interface. In this repo we show how to set up a local BlackLab server and front-end and how to get the present data into BlackLab.
This is work in progress, at this point follow the BlackLab install guide for macos.
Thanks to Jesse de Does (key user of BlackLab, INT) and Jan Niestadt (main author of BlackLab, INT) for helping out with setting up and using BlackLab.