Git-Lit / git-lit

Scripts to create git repositories for ALTO XML texts, like those from the British Library's scanned documents.
31 stars 8 forks source link

Textus Project (and Open Shakespeare) #42

Closed rufuspollock closed 8 years ago

rufuspollock commented 8 years ago

Thought you might be interested in the Textus Project:

http://okfnlabs.org/textus/

In particular, the text format and approach to managing texts might be interesting.

Finally, reading your blog post I thought I would mention Open Shakespeare. This was a project we started back in 2005 and involved putting texts in plain text under version control (svn originally, then hg, then git!):

http://blog.okfn.org/2006/10/04/v03-of-open-shakespeare-released/ http://blog.okfn.org/category/okf-projects/open-shakespeare/

And here's the current git repo:

https://github.com/okfn/shakespeare-material

JonathanReeve commented 8 years ago

Thanks for sending these links! I'm just now getting around to them. Textus looks like a great project. It seems we have similar aims. Although Textus uses JSON, and Git-Lit will use some kind of plain text format (provided we solve #6). Open Shakespeare is great stuff! I'll add it to the corpus downloader I'm making.

rufuspollock commented 8 years ago

@JonathanReeve Textus does not use JSON for the text itself - that is stored as plain text or markdown. Only extended markup is in JSON - this is important as it makes the raw text very easy to manage. I'm mentioning Textus precisely because it may be relevant to #6. We deisgned Textus after looking at most of the existing text formats out there. These either tend to too simple (e.g. markdown) or very heavyweight e.g. full TEI.

The Textus approach of separating text stream from various kinds of markup allows you to have as rich or as simple markup as you want and to have gradual enhanceability. It also supports full on annotation as well as typographical (headings, bold etc) and structural markup (e.g. chapters, pages etc).

I've also commented briefly in #6