bokuweb / docx-rs

:memo: A .docx file writer with Rust/WebAssembly.
https://bokuweb.github.io/docx-rs/
MIT License
334 stars 57 forks source link

Ability to get the last rendered page of a paragraph/element #670

Open Czechh opened 8 months ago

Czechh commented 8 months ago

Is your feature request related to a problem? Please describe.

When reading a docx file, it's really useful to understand where a paragraph is located within a document to create experiences around moving the renderer to that point and generate references and quotes that come from a docx document.

Describe the solution you'd like

Since the page number is really something that is part of the render engine of the docx file, I do believe that editors like MS Word, inserts <w:lastRenderedPageBreak/> break points (more info). So adding using this XML element to infer the page while constructing the document and adding that value to each Paragraph and Table should suffice.

Something like:

impl FromXML for Document {
    fn from_xml<R: Read>(reader: R) -> Result<Self, ReaderError> {
        let mut parser = EventReader::new(reader);
        let mut last_rendered_page_index = 0;
        let mut doc = Self::default();
        loop {
            let e = parser.next();
            match e {
                Ok(XmlEvent::StartElement {
                    attributes, name, ..
                }) => {
                    let e = XMLElement::from_str(&name.local_name).unwrap();
                    match e {
                        XMLElement::Paragraph => {
                            let mut p = Paragraph::read(&mut parser, &attributes)?;
                            p = p.last_rendered_page_break_number(last_rendered_page_index);
                            doc = doc.add_paragraph(p);
                            continue;
                        }
                        ...
                        XMLElement::LastRenderedPageBreak => {
                            last_rendered_page_index += 1;

                            continue;
                        }
                        _ => {}
                        ...

Describe alternatives you've considered

I have considered getting the estimates of the element sizes, and doing a rough calculation of that possible page number. But, this might be more buggy and hacky than the other alternative.

Additional context

I'm happy to work on this, if the author agrees!

bokuweb commented 8 months ago

@Czechh Thanks for your proposal. Also, thanks for sponsoring. I am interested, may I ask you to try to make a PR?

Czechh commented 8 months ago

Of course! I'll get a pr going! Thank you for the response.