Closed helmadik closed 3 years ago
Hi @helmadik There are some scripts noted here: https://github.com/PerseusDL/tei-conversion-tools/wiki/Greek-Betacode-to-Unicode-Transformations
They have limitations, particularly if entities are converted first. If you have (
and it is converted to (
then the script gets it, you of course have all sorts of bad Unicode.
There is also this https://apps.perseids.org/beta-code/ but I have not used it. @zfletch would know more about that.
We don't have a conversion script for XML documents, but Perseids does maintain betacode to Unicode conversion libraries (Python, JavaScript).
Thanks so much to you both! I feel very stupid that I had Bridget's xsl beta-to-unicode already, and didn't try it! I'll see what happens when I try it on Xenophon of Ephesus.
I am assuming must deal with parentheses being treated as breathing marks. This is a real disaster in earlier conversions and you definitely want to make sure you avoid it.
Sent from my iPhone
On Aug 16, 2021, at 8:59 PM, helmadik @.***> wrote:
Closed #1299https://github.com/PerseusDL/canonical-greekLit/issues/1299.
— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHubhttps://github.com/PerseusDL/canonical-greekLit/issues/1299#event-5166067235, or unsubscribehttps://github.com/notifications/unsubscribe-auth/ABHLVGO6PQJXXV7X6F264DLT5GX6RANCNFSM5BUDY6CQ. Triage notifications on the go with GitHub Mobile for iOShttps://apps.apple.com/app/apple-store/id1477376905?ct=notification-email&mt=8&pt=524675 or Androidhttps://play.google.com/store/apps/details?id=com.github.android&utm_campaign=notification-email.
Yes- Good memories of Appian:-) It's understandable that it didn't feature prominently in people's thoughts in the first instance, since parentheses are not really a punctuation mark used in the narrower set of canonical texts. I have tended to iterate over isolated opening parentheses on the one hand (although these tended to have been deleted in our in-house conversion:-() and unrecognized word forms on the other.
OK even though I have Bridget's conversion beta2uni script working for Alpheios api output (obviously, because Bridget handed me everything on a silver platter;-)), I can't make this work with all the stray dtd requirements for ordinary texts. Is there anyone (@lcerrato @gregorycrane ?) who does this on a regular basis?
@helmadik I'm the only person working on this regularly at the moment. Due to the large volume of CHS volunteer work, my focus is presently on those texts. I do ad hoc conversions of the Perseus data as needed. If you'd like me to work on this, just attach the Unicode zip file here.
Wow! Of course - it's actually the non-Unicode tlg0641.tlg001.perseus-grc1.xml that I was asking about. There seems to be only a betacode version of this text. If at some point you have a chance, I'd love to see it in Unicode.. Many thanks in advance! In the meantime, I should work up the courage to do pull requests on the novels that I have so far cleaned up.
Helma Dik Department of Classics University of Chicago
On Mon, Sep 13, 2021 at 1:08 PM Lisa Cerrato @.***> wrote:
@helmadik https://github.com/helmadik I'm the only person working on this regularly at the moment. Due to the large volume of CHS volunteer work, my focus is presently on those texts. I do ad hoc conversions of the Perseus data as needed. If you'd like me to work on this, just attach the Unicode zip file here.
— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/PerseusDL/canonical-greekLit/issues/1299#issuecomment-918446567, or unsubscribe https://github.com/notifications/unsubscribe-auth/ABZLI4MY3NCETNFZLTNDAA3UBY43TANCNFSM5BUDY6CQ . Triage notifications on the go with GitHub Mobile for iOS https://apps.apple.com/app/apple-store/id1477376905?ct=notification-email&mt=8&pt=524675 or Android https://play.google.com/store/apps/details?id=com.github.android&referrer=utm_campaign%3Dnotification-email%26utm_medium%3Demail%26utm_source%3Dgithub.
@helmadik Sorry, I misunderstood. I thought you had converted the current file to Unicode already and wanted the markup to be updated for the Scaife Viewer. I'll take a closer work. Some of the works entered around this time require special handling.
I'm working on this but there are the usual oddities in the beta code and I'm seeing strange output in the Unicode because of those. I have the markup done but am doing a review and section cleanup.
Dear Lisa, Thank you so much for doing this! Please don’t spend too much time cleaning up the Greek conversion issues! I’m very grateful for your help already.. I’m aware of the parentheses etc issues and those are routine for me to clean up. They go through ‘spell check’ on my end, and I have the print source from the library. Thank you again!
On Wed, Sep 15, 2021 at 10:14 Lisa Cerrato @.***> wrote:
I'm working on this but there are the usual oddities in the beta code and I'm seeing strange output in the Unicode because of those. I have the markup done but am doing a review and section cleanup.
— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/PerseusDL/canonical-greekLit/issues/1299#issuecomment-920112226, or unsubscribe https://github.com/notifications/unsubscribe-auth/ABZLI4P6SYQUOC26FSWHXF3UCCZ7FANCNFSM5BUDY6CQ . Triage notifications on the go with GitHub Mobile for iOS https://apps.apple.com/app/apple-store/id1477376905?ct=notification-email&mt=8&pt=524675 or Android https://play.google.com/store/apps/details?id=com.github.android&referrer=utm_campaign%3Dnotification-email%26utm_medium%3Demail%26utm_source%3Dgithub.
-- Helma Dik Department of Classics University of Chicago
@helmadik This is checked in, but there were lots of little oddities in the markup, so I'm sure that some problems snuck through. I hope it's useful. Just tag me in the future with similar requests. --Best, Lisa
Is there a preferred conversion script I could use? Would like to lemmatize and clean up. Thanks! @PonteIneptique maybe you are the best bet?