PerseusDL / canonical

This will be the base repo for all text and annotation data published in the PDL
16 stars 17 forks source link

Plato work/file organization confusing #74

Open rwhaling opened 9 years ago

rwhaling commented 9 years ago

There's some odd overlap between the Plato works that are contained in Tetralogies and the files that are broken down to individual works: https://github.com/PerseusDL/canonical/blob/master/CTS_XML_TEI/perseus/greekLit/tlg0059/tlg002/tlg0059.tlg002.perseus-grc1.xml https://github.com/PerseusDL/canonical/blob/master/CTS_XML_TEI/perseus/greekLit/tlg0059/tlg012/tlg0059.tlg012.perseus-grc1.xml which are "Euthyphro, Apology, Crito, Phaedo", and "Parmenides, Philebus, Symposium, Phaedrus", both contain works which are duplicated as individual files: Crito, Philebus, and Symposium, as tlg0059.tlg003, tlg0059.tlg010, and tlg0059.tlg011 respectively.

This is exacerbated by the fact that the first two works, the tetralogies, lack a parseable way to identify each text. For example, the Phaedrus begins with a head tag attached to the body, but no attribute anywhere identifying the work by it's English title: https://github.com/PerseusDL/canonical/blob/master/CTS_XML_TEI/perseus/greekLit/tlg0059/tlg012/tlg0059.tlg012.perseus-grc1.xml#L86

Adding milestones with type="text" or some such to each of the works in these files, or an xml:id or n attribute to the text element itself, would clear this up, as would splitting them up into individual files.

Perhaps this should be split into two issues; I can also add more documentation if needed.