Closed ajenhl closed 8 years ago
Now that tacl strip
only strips markup from one-file-per-text XML, the new tacl prepare
(that joins multiple-files-per-text into one) can be expanded to handle the different formats of CBETA texts.
It appears that the repository at https://github.com/cbeta-git/xml-p5a/ is the better repository, based on freshness of changes; that should be the first (and perhaps only) new XML source that should be supported.
Added support for the repository at https://github.com/cbeta-git/xml-p5/ in 5a87eec19100ab9f240e7e2593bdba551ede350a.
While this is not the repository mentioned in the previous comment, it is the one recommended to me, so hopefully it will suffice!
TACL currently supports converting the old CBETA TEI P4 XML. It would be good to (instead or as well?) support their later offerings. Unfortunately, there appear to be a plethora of these:
At one point at least, some or all of these used different encoding schemes. Once it is clearer what most (potential) users of TACL are using, then a decision can be made as to which format(s) to support. And in the meantime, hopefully things will settle down to a single form of encoding!