SASDigitalHumanitiesTraining / TextEncoding

Text Encoding for Ancient and Modern Literature, Languages and History
9 stars 5 forks source link

Workflow for transcription #22

Closed luchretius closed 2 years ago

luchretius commented 2 years ago

Hi all! I guess this is less of an issue, but an invitation to share your workflows (i.e. the usual steps that you take, the programmes used etc.) when transcribing the documents you are working with.

As I was going through the walkthrough videos today, I noticed that Gabriel said he used EpiDoc (a variation or schema of TEI) when transcribing, and that Christopher’s preferred method was to first transcribe in Markdown (the language used in GitHub comments) before using Pandoc to convert it to TEI.

This really inspired me, as I find my old way of transcribing everything in a Microsoft Word file to be rather inefficient (when I want to mark out titles and names, for instance, without having to worry about formatting/referencing as I type) and to result in a proliferation of files. This makes it impossible to search for a keyword across different files, while transcribing multiple documents in a single Word file makes it very cumbersome.

So I wondered if any of you would like to share your workflow that could be a better alternative? More specifically, I had these questions in mind:

Many thanks in advance and I look forward to seeing you on Wednesday.

cmohge1 commented 2 years ago

An excellent question, and one that we ought to talk more about as a group today. In my opinion, it really depends on how comfortable you are transcribing in XML. Some people are fine with it, some people are not. And it can be difficult to write in XML when you have a complex document. I usually (though not always) start with Markdown because I get automatic structuring of the document in HTML, and then I convert into TEI with Pandoc (https://pandoc.org/). That allows me to not have to tediously apply structural tags manually.

Here is a Pandoc demo I recorded: https://universityoflondon.cloud.panopto.eu/Panopto/Pages/Viewer.aspx?id=2ff66284-f154-4277-84d0-abe001358da7

I write Markdown in the Atom text editor (https://atom.io/), but I have also used Typora (https://typora.io/), which is more like a word processor. I've never tried Ulysses.

That said, I am comfortable authoring in XML, and sometimes it is better to do so.

Gabby probably has more thoughts. And anyone else should chime in!

gabrielbodard commented 2 years ago

Yes, this is probably best discussed in person, as there are a lot of factors involved. But the main point is that there isn't a right answer: whatever is comfortable and efficient for you and your team. On the one hand, editing directly in XML can be cumbersome, and there are tools for converting to XML of various flavours for you (or you could write a simple script to do part of it for you); but at the same time, all that converting can be laborsome and cumbersome too, so sometimes typing XML while you think actually isn't any slower than typing brackets and formatting styles in MSWord. Just depends how you like to work.