PerseusDL / lexica

Repo for the text files of lexica
Creative Commons Attribution Share Alike 4.0 International
53 stars 23 forks source link

beta code to Unicode #2

Closed lcerrato closed 1 year ago

lcerrato commented 9 years ago

LSJ needs to be converted from beta code to Unicode.

lcerrato commented 7 years ago

@gcelano @gregorycrane just pinging this based on earlier correspondence

I would recommend comparing current version with version at time of branching and working from there.

lcerrato commented 7 years ago

@lcerrato spot check of converted Greek

lcerrato commented 7 years ago

error found in spot check e)p' --> ἐπ' instead of ἐπʼ (curly quote)

gcelano commented 7 years ago

This is important. I will change it with Right Single Quotation Mark

michaelhagedon commented 7 years ago

Is there a tool you are using to do this conversion?

lcerrato commented 7 years ago

Hi @michaelhagedon, I do not know what @gcelano is using for this. Other Perseus works have used a combination of the resources here: https://github.com/PerseusDL/tei-conversion-tools/wiki/Greek-Betacode-to-Unicode-Transformations which have their limitations.

gcelano commented 7 years ago

I used a few scripts that I wrote. The main one is here:

https://gist.githubusercontent.com/gcelano/98fc230470182664eb23/raw/fd333295cff4ba5e6e71299e0a16ca0c872949d4/lp:grc-convert-betacode.xq

But a few other need to be applied before it, because I parceled the task into sub-tasks. You can have a look at the scripts involved here:

http://l-processor.org/w/Ancient_Greek

I definitely need to "package" them for easier use. In the meanwhile, If I can help help with any of them, let me know.

gcelano commented 7 years ago

Hi @michaelhagedon,

have a look at here: https://github.com/gcelano/MorpheusGreekUnicode/blob/master/scripts/convert-betacode-into-unicode-for-Morpheus.xq

I have put together the functions for converting into Unicode. If you have BaseX, you should be able to convert your files.

michaelhagedon commented 7 years ago

Cool, thanks, @gcelano!

TinaRussell commented 5 years ago

I’m glad to see @lcerrato’s efforts over in #47, and I’m curious to know if this issue is being worked on at present. Thanks!

lcerrato commented 5 years ago

@TinaRussell Hi, no, we don't have anyone working on the conversion of the Perseus files at present.

TinaRussell commented 5 years ago

Okay. @gcelano, I’d like to know how to use the script you posted; I’ve tried saving the script, changing the $d variable definition to the location of a local copy of one of the LSJ files, and running it from the command line as basex convert-betacode-into-unicode-for-Morpheus.xq (in the same directory as the script, of course). It just prints the specified LSJ file raw to standard output and doesn’t change anything. What am I missing? (I’m using BaseX 9.0.1.) Also, is the script meant to be used on the whole document, or am I supposed to identify first which tags have the “lang="greek"” and tell the script to translate just those? (I… don’t really know anything about XQuery or BaseX, I’m wingin’ it here. :disappointed_relieved:)

gcelano commented 5 years ago

@TinaRussell , You need to use the BaseX GUI. In any case, the script was written to make the conversion of Morpheus, which means that the script requires its XML structure to run. If you want to re-use the functions, that can be made, but the script should be changed accordingly (which also means identifying in the XML files where the Betacode is)

TinaRussell commented 5 years ago

@gcelano Ah, that makes sense. Thanks!