byztxt / byzantine-majority-text

Byzantine Majority Greek New Testament text edited by Robinson and Pierpont, with morphological parsing tags and Strong's numbers
The Unlicense
55 stars 13 forks source link

Error with csv_converter.py #23

Closed githubyouser closed 1 year ago

githubyouser commented 1 year ago

Hi, I'm trying my hand at converting the .CCT files to Unicode CSV. (I'm actually wanting to convert some of the other files, like Stephanus's TR to Unicode, but I figured I'd learn the ropes with something easier. :)

I finally got all the requirements to install properly, and now I'm trying to run the script. But now I get the following error: (I'm using Python 3.11 on Windows 11)

Traceback (most recent call last):
  File "C:\Users\Personal\Downloads\byztxt byzantine-majority-text master scripts\csv_converter.py", line 5, in <module>    import beta_code
  File "C:\Users\Personal\AppData\Local\Programs\Python\Python311\Lib\site-packages\beta_code\__init__.py", line 1, in <module>
    from .beta_code import greek_to_beta_code, beta_code_to_greek
  File "C:\Users\Personal\AppData\Local\Programs\Python\Python311\Lib\site-packages\beta_code\beta_code.py", line 9, in <module>
    BETA_CODE_TO_UNICODE_MAP = json.load(json_file)
                               ^^^^^^^^^^^^^^^^^^^^
  File "C:\Users\Personal\AppData\Local\Programs\Python\Python311\Lib\json\__init__.py", line 293, in load
    return loads(fp.read(),
                 ^^^^^^^^^
  File "C:\Users\Personal\AppData\Local\Programs\Python\Python311\Lib\encodings\cp1252.py", line 23, in decode
    return codecs.charmap_decode(input,self.errors,decoding_table)[0]
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
UnicodeDecodeError: 'charmap' codec can't decode byte 0x8d in position 54: character maps to <undefined>

I know from experience that the error probably has something to with utf-8 charset, but I'm not even sure where to start looking for the problem. Python and programming are new to me. If anyone has any tips for me, I'd greatly appreciate it! Thanks!

normansimonr commented 1 year ago

Hi @githubyouser thanks for reaching out!

I have not seen that error before, so my hunch is that it's got something to do with the way Windows handles encoding. I may be wrong tho. Since unfortunately we don't officially support Windows, one option would be to use WSL to have a Linux distro on your Windows machine. Then it'd be a matter of cloning the repo, installing the requirements with pip and executing the scripts.

Let me know if that could work for you. If for some reason this is not an option, I can dig a little bit to see if I can help more!

Thanks!

githubyouser commented 1 year ago

Well that was easy enough. :) Thanks, @normansimonr! Once I got WSL all set up and cloned the repo it ran first time without a hitch!! Now to figure out how to convert some of the other files... In particular I'm interested in https://github.com/byztxt/greektext-scrivener I'll have a go at it, and see how far I can get before I need more help.

normansimonr commented 1 year ago

@githubyouser Brilliant! Glad to hear that! Something worth mentioning tho is that the Unicode converter works only for the Robinson & Pierpont text. We haven't tested the code with he textus recepta from the other repos, so it's likely that it won't work.

emg commented 1 year ago

@githubyouser Thank you so much for your interest. I am so glad you want to use these texts. For converting these, please have a look at byztxt/librobinson. There should be some code in there to get you started. It is, however, almost 20 years old, and may still be Python 2. If you could help rewriting it, or making it work with Python 3, that would be great. There is a Unicode converter in there which should work for the codes used in the Scrivener text. Many thanks.

githubyouser commented 1 year ago

Thanks, @normansimonr and @emg. I was looking at librobinson, but my knowledge of Python is very limited, and I'm not even quite sure where to start at converting some of the other texts to Unicode. I'll have to take a look at it when I have some more time.