atomic14 / diy-esp32-epub-reader

An ESP32 e-reader
MIT License
319 stars 43 forks source link

Feature/46 index #67

Closed martinberlin closed 2 years ago

martinberlin commented 2 years ago

First draft to parse the content's index. @cgreening I will need some help with this one since I don't understand completely the whole structure yet. Started only with the parsing the toc.ncx file This is what it outputs via Serial for the ww2 book:

ncx in line:4
I (17758) EPUB: title Cubierta src Text/cubierta.xhtml
I (17758) EPUB: title Pequeñas grandes historias de la Segunda Guerra Mundial src Text/titulo.xhtml
I (17768) EPUB: title Introducción src Text/Introduccion.xhtml
I (17768) EPUB: title 1. Los primeros src Text/Capitulo1.xhtml
I (17778) EPUB: title 2. En la retaguardia src Text/Capitulo2.xhtml
I (17788) EPUB: title 3. El esfuerzo de guerra src Text/Capitulo3.xhtml
I (17788) EPUB: title 4. En el aire src Text/Capitulo4.xhtml
I (17798) EPUB: title 5. En el mar src Text/Capitulo5.xhtml
I (17808) EPUB: title 6. La tragedia de la guerra src Text/Capitulo6.xhtml
I (17808) EPUB: title 7. En la línea de fuego src Text/Capitulo7.xhtml
I (17818) EPUB: title 8. Los otros protagonistas src Text/Capitulo8.xhtml
I (17828) EPUB: title 9. Historias de ingenio src Text/Capitulo9.xhtml
I (17838) EPUB: title 10. Hechos insólitos src Text/Capitulo10.xhtml
I (17838) EPUB: title 11. Los últimos src Text/Capitulo11.xhtml
I (17848) EPUB: title Apéndice src Text/Apendice.xhtml
I (17858) EPUB: title Bibliografía src Text/Bibliografia.xhtml
I (17858) EPUB: title Autor src Text/autor.xhtml
I (17868) EPUB: title Notas src Text/notas.xhtm

Question do we need to take this *.ncx name from the manifest or it will be faster just to open the first OEBPS filename.ncx (So far all have the ncx extension sometimes with different filename) I would go for the fastest option if possible.

martinberlin commented 2 years ago

@cgreening exploring more epubs I realized that the toc can be nested (sublevels) For example the book piedra_* here: https://sync.luckycloud.de/d/a04070e242b841f7b784/

This is how toc.ncx it's read by tinyXML2

I (23830) EPUB: title Cubierta src Text/cubierta.xhtml
I (23840) EPUB: title Extracción de la piedra de locura src Text/titulo.xhtml
I (23840) EPUB: title I (1966) src Text/Section0001.xhtml
I (23840) EPUB: title II (1963) src Text/Section0014.xhtml
I (23850) EPUB: title III (1962) src Text/Section0027.xhtml
I (23860) EPUB: title IV (1964) src Text/Section0029.xhtml
I (23860) EPUB: title Autor src Text/autor.xhtml
Guru Meditation Error: Core  1 panic'ed (LoadProhibited)

Why it's read like this? Simply because the Chapters can be nested. So section I (1966) has inside from 0001 to 0014 sections. Even like that I don't really understand why it ends with an error.

We need to decide how are we going to render this, for example in bold the main category and normal font the sub-categories.

cgreening commented 2 years ago

How deep can the nesting go? Is it just one level or can there be multiple levels?

martinberlin commented 2 years ago

Most books have 1 or 2 but I think is not limited. With 2 we would be more than safe for 95 % of books. Why the test fails? Expected 'HENRY JEKYLL\xE2\x80\x99S FULL STATEMENT OF THE CASE' Was 'INCIDENT OF THE LETTER' [FAILED]

cgreening commented 2 years ago

I quite like the idea of getting a flat list of TOC items with a depth counter. You can write quite a nice recursive function to collect them all.

Do you have a good example of a file with nested items?

martinberlin commented 2 years ago

Sure in same luckycloud link as before: blade runner book has a complex index. https://sync.luckycloud.de/d/a04070e242b841f7b784/

But also the book piedra from Pizarnik. I would love to see how do you build that function! I would say: 1st level bold. All other levels just 2 blank spaces left.

martinberlin commented 2 years ago

@cgreening searching for a way to relate the contents Spine with the Toc index. I would like to be able to send the ID on the selected item and the book is open on the right section. This is what I see so far:

ncx index: I (19399) EPUB: Cubierta -> Text/cubierta.xhtml I (19399) EPUB: Poesía completa -> Text/titulo.xhtml I (19399) EPUB: La tierra más lejana (1955) -> Text/Section0001.xhtml

I (5369) PUBLIST: Rendering item 1 cubierta.xhtml -> OEBPS/Text/cubierta.xhtml sinopsis.xhtml -> OEBPS/Text/sinopsis.xhtml titulo.xhtml -> OEBPS/Text/titulo.xhtml

Added a second item to the spine since I though that having the plain spine xhtml filename without the Route will be easier to accomplish this but I'm not sure if it was the right idea. What I need to extend is this method: EpubReader::parse_and_layout_current_section()

To receive a string EpubReader::parse_and_layout_section(std::string section_id) And receiving titulo.xhtml then it should open the book in OEBPS/Text/titulo.xhtml is that the right way to go or there is a better idea without extending the Spine?

cgreening commented 2 years ago

This is now rendering and should be navigating to the right section. The rendering code could be made a lot nicer - it would be better if it used flexible cell heights instead of fixed ones as currently there's a lot of wasted space.

I'm wondering if the names are correct EpubIndex feels wrong - I wonder if it should be EpubTableOfContents - just to be clear what it is.

martinberlin commented 2 years ago

Yes TableOfContents or EpubToc that is shorter and nicer sounds better to me. I will test soon after dealing with some work and give you feedback about it.

cgreening commented 2 years ago

I've added issues for the outstanding items.

martinberlin commented 2 years ago

Alright then it is ready for Squash and merge I would say. Unless we forget something. It builds correcly in epdiy and lilygo environments.