How do I detect tables and their contents?

This is about reading .docx files rather than writing them.

I have some lines like this (based on this page):

let data: Value = serde_json::from_str(&read_docx(&read_to_vec(file_name)?)?.json())?;
if let Some(children) = data["document"]["children"].as_array() {
    children.iter().for_each(|node| {
        let n = read_children(node);
        n_words_in_docx += n;
        ()
    });
    ...

The idea of this is that you get an array of "nodes". The nodes can themselves have child nodes, and the function read_children calls itself recursively. Despite that, no text in tables in the Word document is identified. That's my main question, but I'm also not sure about the handling of headers, footers, footnotes, comments, text boxes, watermarks... I want to sweep up all the text in the file if possible.

NB I'm a Rust uber-newb, but I have now cloned your repo, and am currently taking a look at reader/mod.rs fn read_docx and also reader/read_zip.rs fn read_zip... is it possible that one of these fails to parse (document.xml?) as it should?

With my document (consisting of just one table), I get just two nodes. Neither produces any text, using the method in the above page. So then I looked at the json produced:

println!("read_children...\n{}", serde_json::to_string_pretty(node).unwrap());

read_children...
{
  "data": {
    "grid": [
      4928,
      4926
    ],
    "hasNumbering": false,
    "property": {
      "borders": {
        "bottom": null,
        "insideH": null,
        "insideV": null,
        "left": null,
        "right": null,
        "top": null
      },
      "justification": "left",
      "style": "TableGrid",
      "width": {
        "width": 0,
        "widthType": "auto"
      }
    },
    "rows": [
      {
        "data": {
          "cells": [
            {
              "data": {
                "children": [
                  {
                    "data": {
                      "children": [
                        {
                          "data": {
                            "children": [
                              {
                                "data": {
                                  "preserveSpace": true,
                                  "text": "CHAPTER I. "
                                },
                                "type": "text"
                              }
                            ],
                            "runProperty": {}
                          },
                          "type": "run"
                        }
                      ],
                      "hasNumbering": false,
                      "id": "00000001",
                      "property": {
                        "indent": {
                          "end": null,
                          "firstLineChars": null,
                          "hangingChars": null,
                          "specialIndent": {
                            "type": "firstLine",
                            "val": 0
                          },
                          "start": 0,
                          "startChars": null
                        },
                        "runProperty": {},
                        "tabs": []
                      }
                    },
                    "type": "paragraph"
                  }
                ],
                "hasNumbering": false,
                "property": {
                  "borders": null,
                  "gridSpan": null,
                  "shading": null,
                  "textDirection": null,
                  "verticalAlign": null,
                  "verticalMerge": null,
                  "width": {
                    "width": 4928,
                    "widthType": "dxa"
                  }
                }
              },
              "type": "tableCell"
            },
            {
              "data": {
                "children": [
                  {
...

"text": "CHAPTER I. " in the above is not identified as text. ... is it possible that the parsing of such a file is not recursively exploring "rows", "cells" and "children" keys? Way out of my depth now.

bokuweb / docx-rs

How do I detect tables and their contents? #651