PerseusDL / canonical-greekLit

XML Canonical resources for Greek Literature
https://scaife.perseus.org
Creative Commons Attribution Share Alike 4.0 International
100 stars 93 forks source link

tlg0641.tlg001.perseus-grc1.xml available in betacode only #1299

Closed helmadik closed 3 years ago

helmadik commented 3 years ago

Is there a preferred conversion script I could use? Would like to lemmatize and clean up. Thanks! @PonteIneptique maybe you are the best bet?

lcerrato commented 3 years ago

Hi @helmadik There are some scripts noted here: https://github.com/PerseusDL/tei-conversion-tools/wiki/Greek-Betacode-to-Unicode-Transformations

They have limitations, particularly if entities are converted first. If you have ( and it is converted to ( then the script gets it, you of course have all sorts of bad Unicode.

There is also this https://apps.perseids.org/beta-code/ but I have not used it. @zfletch would know more about that.

zfletch commented 3 years ago

We don't have a conversion script for XML documents, but Perseids does maintain betacode to Unicode conversion libraries (Python, JavaScript).

helmadik commented 3 years ago

Thanks so much to you both! I feel very stupid that I had Bridget's xsl beta-to-unicode already, and didn't try it! I'll see what happens when I try it on Xenophon of Ephesus.

gregorycrane commented 3 years ago

I am assuming must deal with parentheses being treated as breathing marks. This is a real disaster in earlier conversions and you definitely want to make sure you avoid it.

Sent from my iPhone

On Aug 16, 2021, at 8:59 PM, helmadik @.***> wrote:



Closed #1299https://github.com/PerseusDL/canonical-greekLit/issues/1299.

— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHubhttps://github.com/PerseusDL/canonical-greekLit/issues/1299#event-5166067235, or unsubscribehttps://github.com/notifications/unsubscribe-auth/ABHLVGO6PQJXXV7X6F264DLT5GX6RANCNFSM5BUDY6CQ. Triage notifications on the go with GitHub Mobile for iOShttps://apps.apple.com/app/apple-store/id1477376905?ct=notification-email&mt=8&pt=524675 or Androidhttps://play.google.com/store/apps/details?id=com.github.android&utm_campaign=notification-email.

helmadik commented 3 years ago

Yes- Good memories of Appian:-) It's understandable that it didn't feature prominently in people's thoughts in the first instance, since parentheses are not really a punctuation mark used in the narrower set of canonical texts. I have tended to iterate over isolated opening parentheses on the one hand (although these tended to have been deleted in our in-house conversion:-() and unrecognized word forms on the other.

helmadik commented 3 years ago

OK even though I have Bridget's conversion beta2uni script working for Alpheios api output (obviously, because Bridget handed me everything on a silver platter;-)), I can't make this work with all the stray dtd requirements for ordinary texts. Is there anyone (@lcerrato @gregorycrane ?) who does this on a regular basis?

lcerrato commented 3 years ago

@helmadik I'm the only person working on this regularly at the moment. Due to the large volume of CHS volunteer work, my focus is presently on those texts. I do ad hoc conversions of the Perseus data as needed. If you'd like me to work on this, just attach the Unicode zip file here.

helmadik commented 3 years ago

Wow! Of course - it's actually the non-Unicode tlg0641.tlg001.perseus-grc1.xml that I was asking about. There seems to be only a betacode version of this text. If at some point you have a chance, I'd love to see it in Unicode.. Many thanks in advance! In the meantime, I should work up the courage to do pull requests on the novels that I have so far cleaned up.

Helma Dik Department of Classics University of Chicago

On Mon, Sep 13, 2021 at 1:08 PM Lisa Cerrato @.***> wrote:

@helmadik https://github.com/helmadik I'm the only person working on this regularly at the moment. Due to the large volume of CHS volunteer work, my focus is presently on those texts. I do ad hoc conversions of the Perseus data as needed. If you'd like me to work on this, just attach the Unicode zip file here.

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/PerseusDL/canonical-greekLit/issues/1299#issuecomment-918446567, or unsubscribe https://github.com/notifications/unsubscribe-auth/ABZLI4MY3NCETNFZLTNDAA3UBY43TANCNFSM5BUDY6CQ . Triage notifications on the go with GitHub Mobile for iOS https://apps.apple.com/app/apple-store/id1477376905?ct=notification-email&mt=8&pt=524675 or Android https://play.google.com/store/apps/details?id=com.github.android&referrer=utm_campaign%3Dnotification-email%26utm_medium%3Demail%26utm_source%3Dgithub.

lcerrato commented 3 years ago

@helmadik Sorry, I misunderstood. I thought you had converted the current file to Unicode already and wanted the markup to be updated for the Scaife Viewer. I'll take a closer work. Some of the works entered around this time require special handling.

lcerrato commented 3 years ago

I'm working on this but there are the usual oddities in the beta code and I'm seeing strange output in the Unicode because of those. I have the markup done but am doing a review and section cleanup.

helmadik commented 3 years ago

Dear Lisa, Thank you so much for doing this! Please don’t spend too much time cleaning up the Greek conversion issues! I’m very grateful for your help already.. I’m aware of the parentheses etc issues and those are routine for me to clean up. They go through ‘spell check’ on my end, and I have the print source from the library. Thank you again!

On Wed, Sep 15, 2021 at 10:14 Lisa Cerrato @.***> wrote:

I'm working on this but there are the usual oddities in the beta code and I'm seeing strange output in the Unicode because of those. I have the markup done but am doing a review and section cleanup.

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/PerseusDL/canonical-greekLit/issues/1299#issuecomment-920112226, or unsubscribe https://github.com/notifications/unsubscribe-auth/ABZLI4P6SYQUOC26FSWHXF3UCCZ7FANCNFSM5BUDY6CQ . Triage notifications on the go with GitHub Mobile for iOS https://apps.apple.com/app/apple-store/id1477376905?ct=notification-email&mt=8&pt=524675 or Android https://play.google.com/store/apps/details?id=com.github.android&referrer=utm_campaign%3Dnotification-email%26utm_medium%3Demail%26utm_source%3Dgithub.

-- Helma Dik Department of Classics University of Chicago

lcerrato commented 3 years ago

@helmadik This is checked in, but there were lots of little oddities in the markup, so I'm sure that some problems snuck through. I hope it's useful. Just tag me in the future with similar requests. --Best, Lisa