atomic14 / diy-esp32-epub-reader

An ESP32 e-reader
MIT License
319 stars 43 forks source link

Help in adding a language. #75

Closed shojkeee closed 2 years ago

shojkeee commented 2 years ago

Hello. Help please. I'm from Russia. And it turned out that this program does not support Cyrillic. Please add a language. Or please tell me how to add. I attach the book to the attachment for the test 80072670.zip .

martinberlin commented 2 years ago

Hallo @shojkeee sure we will take a look. @cgreening can you add here some steps to add a Cyrillic font? Is there any other thing to consider when writing Cyrillic?

shojkeee commented 2 years ago

Hallo @shojkeee уверен , что мы будем смотреть. @cgreening, можете ли вы добавить сюда несколько шагов, чтобы добавить кириллический шрифт? Что еще нужно учитывать при написании кириллицы?

Here is a table of codes of Russian letters in utf8. Maybe this will help. https://i.voenmeh.ru/kafi5/Kam.loc/inform/UTF-8.htm

cgreening commented 2 years ago

I think it's just a case of adding it to the font generation script. I'll do it abs document the steps.

cgreening commented 2 years ago

The font intervals are defined in here: scripts/fontconvert.py so in theory we just add the ranges in and run the shell script... I'll try it and see what happens.

shojkeee commented 2 years ago

I think it's just a case of adding it to the font generation script. I'll do it abs document the steps.

Thanks. I will really wait for the result.

cgreening commented 2 years ago

Will be later today/this evening.

martinberlin commented 2 years ago

Hi Chris, maybe it would be a nice feature if we can switch to this new font based on detecting language? For example, the ePub provided as test has the following indicator showing is Russian in the content.opf XML:

<dc:language>ru</dc:language>

Then if we detect ru language we can switch to use this new font. I think also another languages use Cyrillic apart of russian like: Bulgarian, Macedonian and Serbian.

shojkeee commented 2 years ago

Sorry. I started to understand the script myself fontconvert.py . Ready-made this script creates fonts well, but if you do not change it. The question is where to change the interval there. I can't understand it in any way. I see that the interval is changing in the script get_intervals_from_font.py . But this script is not involved in creating font files.

shojkeee commented 2 years ago

"You can change which unicode character codes are to be exported by specifying additional ranges of unicode code points with --additional-intervals. Intervals are written as min,max. To add multiple intervals, you can specify the --additional-intervals option multiple times.

./fontconvert.py ... --additional-intervals 0xE0A0,0xE0A2 --additional-intervals 0xE0B0,0xE0B3 ..."

How to view the interval for Russian letters in hexadecimal format? I can't find :(

cgreening commented 2 years ago

I've made the required changes here: https://github.com/atomic14/diy-esp32-epub-reader/pull/76

But - the current font doesn't have the code points defined here:https://en.wikipedia.org/wiki/List_of_Unicode_characters#Cyrillic

shojkeee commented 2 years ago

Thank you very much. I'll try it now. But before that, can you please explain the following to me: In the get_interval_from_font script.the interval is set to py. It is originally " for c in range(0, 65536):"

Where does this interval come from? I tried changing the dot interval from 0 to 209 145. And very few characters were added there. I haven't had time to check whether I succeeded or not. The question is how to find out the correct number of the interval for the letters I am interested in?

shojkeee commented 2 years ago

More or less similar numbers to the truth are in the utf8 code table. Is that right? need a Utf8 code in decimal format?

cgreening commented 2 years ago

I've updated the PR to use open sans as the fallback.

In the script folder there are two files fontconvert.py and generate_fonts.sh. I just add the new codepoints to the fontconvert.py and then run the generate_fonts.sh.

cgreening commented 2 years ago

It's the code points in hexidecimal UTF-16. For now I've taken the code block from here: https://en.wikipedia.org/wiki/List_of_Unicode_characters#Cyrillic which is:

    (0x400, 0x486),  # https://en.wikipedia.org/wiki/List_of_Unicode_characters#Cyrillic
    (0x489, 0x4FF),  # https://en.wikipedia.org/wiki/List_of_Unicode_characters#Cyrillic

There's a couple of missing characters that I've ignore for now.

shojkeee commented 2 years ago

I've updated the PR to use open sans as the fallback.

In the script folder there are two files fontconvert.py and generate_fonts.sh. I just add the new codepoints to the fontconvert.py and then run the generate_fonts.sh.

I saw that, just generate_font.sh generates 4 files at once and throws them into the lib/font folder. And I ran commands from this file separately to see what it creates.

cgreening commented 2 years ago

Try the code in this branch and see if it works - feature/add-cyrillic

shojkeee commented 2 years ago

Excuse me. And you can't give a link to this branch to download it. I don't know much about github. Yesterday, when I tried to build the program, I got an error until I downloaded it with the git clone --recursive command https://github.com/atomic14/diy-esp32-epub-reader.git

I think the link should be the same approximately?

shojkeee commented 2 years ago

Checking size .pio\build\m5_paper\firmware.elf Advanced Memory Usage is available via "PlatformIO Home > Project Inspect" Error: The program size (1077749 bytes) is greater than maximum allowed (1048576 bytes) RAM: [= ] 9.7% (used 31860 bytes from 327680 bytes) Flash: [====*** [checkprogsize] Explicit exit, status 1 ======] 102.8% (used 1077749 bytes from 1048576 bytes) ============================================================================================= [FAILED] Took 53.87 seconds =============================================================================================

Environment Status Duration


m5_paper FAILED 00:00:53.874

Not enough memory

martinberlin commented 2 years ago

Hello @shojkeee please checkout the branch: feature/add-cyrillic

Set the platformio.ini to: default_envs = m5_paper And build again. It builds correctly for me: Advanced Memory Usage is available via "PlatformIO Home > Project Inspect" RAM: [= ] 9.8% (used 32048 bytes from 327680 bytes) Flash: [========= ] 91.4% (used 1078741 bytes from 1179648 bytes)

Environment Status


m5_paper SUCCESS

shojkeee commented 2 years ago

Hello @shojkeee please checkout the branch: feature/add-cyrillic

Set the platformio.ini to: default_envs = m5_paper And build again. It builds correctly for me: Advanced Memory Usage is available via "PlatformIO Home > Project Inspect" RAM: [= ] 9.8% (used 32048 bytes from 327680 bytes) Flash: [========= ] 91.4% (used 1078741 bytes from 1179648 bytes)

Environment Status

m5_paper SUCCESS

Please help me. What command was entered to download the project from this branch? If I 'm pumping .zip archive, then does not compile at all swears at libraries.

cgreening commented 2 years ago

Git can be pretty confusing if you've not used it much.

git clone --recursive git@github.com:atomic14/diy-esp32-epub-reader.git
cd diy-esp32-epub-reader
git checkout feature/add-cyrillic

The above will clone the repository with all the submodules into diy-esp32-epub-reader. And then you change into that directory and checkout the branch.

shojkeee commented 2 years ago

@cgreening Thanks for the help with git. Thank you for adding Russian. Now I checked on a book that was not read. Everything is working perfectly now. Will you add Russian to the main branch? I want to make a video about it, I have 15k subscribers on YouTube, I think they will need it.

image

martinberlin commented 2 years ago

Tested here and also works 7E139159-B86A-4A2D-9A95-D4D3B435A9F2 08873056-0C03-439C-9E28-3BCE50FE4A27 @cgreening I think is ready to be merged reviewing it now.

cgreening commented 2 years ago

I wonder if we should just use opensans as our default font?

cgreening commented 2 years ago

@shojkeee Yes! Please do a video - that would be great - once we've merged the pull request it will be in the main branch.

cgreening commented 2 years ago

All done.