eea / odfpy

API for OpenDocument in Python
GNU General Public License v2.0
311 stars 64 forks source link

Potentially unexpected output when parsing text #92

Open danielloader opened 4 years ago

danielloader commented 4 years ago

Just want to start by saying thanks for this library it's been great!

I have a query though - it might be by design in which case quick to close but figured it's worth clarifying.

I have a word 97 document I am converting to ODT (first locally by libreoffice 6 and then I tried an API service to automate this...)

Using odfpy I've been able to pull this table content into a dictionary:

First from the ODT from LibreOffice:

{
    "day": "Monday,
    "pasta_bar": "Tagliatelli of creamed wild Mushrooms",
    "traditional_choice": "Chicken and Spinach Curry with Rice",
    "theatre_bar": "Seared sea Bass, Rosti, Sofrito sauce",
    "daily_special": "Maple roast Bacon loin,Colcannon Mash,Grain Mustard sauce"
}

Secondly from Zamzar.com or docconversionapi.com

{
    "day":  "Monday",
    "pastabar": "TagliatelliofcreamedwildMushrooms",
    "traditionalchoice": "ChickenandSpinachCurrywithRice",
    "theatrebar": "SearedseaBass,Rosti,Sofritosauce",
    "dailyspecial": "MapleroastBaconloin,ColcannonMash,GrainMustardsauce"
}

After assuming the source XML was different I went digging and extracted the table from both content.xml files inside the odt:

Seems LibreOffice creates paragraphs and the other services make spans instead - interestingly both render visually correctly in LibreOffice if I open both resulting converted documents.

So my question here: Is this a bug or expected behaviour? Thanks!

Local Install of LibreOffice 6.0.7.3 00m0(Build:3)

<table:table table:name="Table3" table:style-name="Table3">
    <table:table-column table:style-name="Table3.A" />
    <table:table-column table:style-name="Table3.B" />
    <table:table-column table:style-name="Table3.C" />
    <table:table-column table:style-name="Table3.D" />
    <table:table-row table:style-name="Table3.1">
        <table:table-cell table:style-name="Table3.A1" office:value-type="string">
            <text:h text:style-name="P1" text:outline-level="3">Monday</text:h>
        </table:table-cell>
        <table:table-cell table:style-name="Table3.A1" office:value-type="string">
            <text:p text:style-name="P6" />
        </table:table-cell>
        <table:table-cell table:style-name="Table3.A1" office:value-type="string">
            <text:p text:style-name="P8" />
        </table:table-cell>
        <table:table-cell table:style-name="Table3.A1" office:value-type="string">
            <text:p text:style-name="P8" />
        </table:table-cell>
    </table:table-row>
    <table:table-row table:style-name="Table3.2">
        <table:table-cell table:style-name="Table3.A1" office:value-type="string">
            <text:h text:style-name="P2" text:outline-level="3" />
        </table:table-cell>
        <table:table-cell table:style-name="Table3.B2" office:value-type="string">
            <text:p text:style-name="P12">Pasta Bar</text:p>
        </table:table-cell>
        <table:table-cell table:style-name="Table3.A1" office:value-type="string">
            <text:p text:style-name="P9">Tagliatelli of creamed wild Mushrooms</text:p>
        </table:table-cell>
        <table:table-cell table:style-name="Table3.A1" office:value-type="string">
            <text:p text:style-name="P9">£2.00</text:p>
        </table:table-cell>
    </table:table-row>
    <table:table-row table:style-name="Table3.3">
        <table:table-cell table:style-name="Table3.A1" office:value-type="string">
            <text:p text:style-name="P8" />
        </table:table-cell>
        <table:table-cell table:style-name="Table3.B3" office:value-type="string">
            <text:p text:style-name="P11">Traditional Choice</text:p>
        </table:table-cell>
        <table:table-cell table:style-name="Table3.A1" office:value-type="string">
            <text:p text:style-name="P9">Chicken and Spinach Curry with Rice</text:p>
        </table:table-cell>
        <table:table-cell table:style-name="Table3.A1" office:value-type="string">
            <text:p text:style-name="P9">£2.00</text:p>
        </table:table-cell>
    </table:table-row>
    <table:table-row table:style-name="Table3.4">
        <table:table-cell table:style-name="Table3.A1" office:value-type="string">
            <text:p text:style-name="P8" />
        </table:table-cell>
        <table:table-cell table:style-name="Table3.B4" office:value-type="string">
            <text:p text:style-name="P13">Theatre Bar</text:p>
        </table:table-cell>
        <table:table-cell table:style-name="Table3.A1" office:value-type="string">
            <text:p text:style-name="P9">Seared sea Bass, Rosti, Sofrito sauce</text:p>
        </table:table-cell>
        <table:table-cell table:style-name="Table3.A1" office:value-type="string">
            <text:p text:style-name="P9">£3.50</text:p>
        </table:table-cell>
    </table:table-row>
    <table:table-row table:style-name="Table3.3">
        <table:table-cell table:style-name="Table3.A1" office:value-type="string">
            <text:p text:style-name="P8" />
        </table:table-cell>
        <table:table-cell table:style-name="Table3.B5" office:value-type="string">
            <text:p text:style-name="P11">Daily Special</text:p>
        </table:table-cell>
        <table:table-cell table:style-name="Table3.A1" office:value-type="string">
            <text:p text:style-name="P9">Maple roast Bacon loin,Colcannon Mash,Grain Mustard sauce</text:p>
        </table:table-cell>
        <table:table-cell table:style-name="Table3.A1" office:value-type="string">
            <text:p text:style-name="P9">£3.50</text:p>
            <text:p text:style-name="P9" />
        </table:table-cell>
    </table:table-row>
    <table:table-row table:style-name="Table3.3">
        <table:table-cell table:style-name="Table3.A1" office:value-type="string">
            <text:p text:style-name="P8" />
        </table:table-cell>
        <table:table-cell table:style-name="Table3.A1" office:value-type="string">
            <text:p text:style-name="P8" />
        </table:table-cell>
        <table:table-cell table:style-name="Table3.A1" office:value-type="string">
            <text:p text:style-name="P8" />
        </table:table-cell>
        <table:table-cell table:style-name="Table3.A1" office:value-type="string">
            <text:p text:style-name="P8" />
        </table:table-cell>
    </table:table-row>
    <table:table-row table:style-name="Table3.3">
        <table:table-cell table:style-name="Table3.A1" office:value-type="string">
            <text:p text:style-name="P8" />
        </table:table-cell>
        <table:table-cell table:style-name="Table3.A1" office:value-type="string">
            <text:p text:style-name="P8" />
        </table:table-cell>
        <table:table-cell table:style-name="Table3.A1" office:value-type="string">
            <text:p text:style-name="P8" />
        </table:table-cell>
        <table:table-cell table:style-name="Table3.A1" office:value-type="string">
            <text:p text:style-name="P8" />
        </table:table-cell>
    </table:table-row>
</table:table>

and how it renders in LibreOffice: https://i.imgur.com/PVbGo4v.png

Zamzar.com or docconversionapi.com

<table:table table:style-name="Table3">
    <table:table-column table:style-name="Column9" />
    <table:table-column table:style-name="Column10" />
    <table:table-column table:style-name="Column11" />
    <table:table-column table:style-name="Column12" />
    <table:table-row table:style-name="Row15">
        <table:table-cell table:style-name="Cell57">
            <text:h text:style-name="P80" text:outline-level="3">
                <text:span text:style-name="T80_1">Monday</text:span>
            </text:h>
        </table:table-cell>
        <table:table-cell table:style-name="Cell58">
            <text:p text:style-name="P81" />
        </table:table-cell>
        <table:table-cell table:style-name="Cell59">
            <text:p text:style-name="P82" />
        </table:table-cell>
        <table:table-cell table:style-name="Cell60">
            <text:p text:style-name="P83" />
        </table:table-cell>
    </table:table-row>
    <table:table-row table:style-name="Row16">
        <table:table-cell table:style-name="Cell61">
            <text:h text:style-name="P84" text:outline-level="3" />
        </table:table-cell>
        <table:table-cell table:style-name="Cell62">
            <text:p text:style-name="P85">
                <text:span text:style-name="T85_1">
                    Pasta
                    <text:s />
                    Bar
                </text:span>
            </text:p>
        </table:table-cell>
        <table:table-cell table:style-name="Cell63">
            <text:p text:style-name="P86">
                <text:span text:style-name="T86_1">
                    Tagliatelli
                    <text:s />
                    of
                    <text:s />
                    creamed
                    <text:s />
                    wild
                    <text:s />
                    Mushrooms
                </text:span>
            </text:p>
        </table:table-cell>
        <table:table-cell table:style-name="Cell64">
            <text:p text:style-name="P87">
                <text:span text:style-name="T87_1">£2.00</text:span>
            </text:p>
        </table:table-cell>
    </table:table-row>
    <table:table-row table:style-name="Row17">
        <table:table-cell table:style-name="Cell65">
            <text:p text:style-name="P88" />
        </table:table-cell>
        <table:table-cell table:style-name="Cell66">
            <text:p text:style-name="P89">
                <text:span text:style-name="T89_1">
                    Traditional
                    <text:s />
                    Choice
                </text:span>
            </text:p>
        </table:table-cell>
        <table:table-cell table:style-name="Cell67">
            <text:p text:style-name="P90">
                <text:span text:style-name="T90_1">
                    Chicken
                    <text:s />
                    and
                    <text:s />
                    Spinach
                    <text:s />
                    Curry
                    <text:s />
                    with
                    <text:s />
                    Rice
                </text:span>
            </text:p>
        </table:table-cell>
        <table:table-cell table:style-name="Cell68">
            <text:p text:style-name="P91">
                <text:span text:style-name="T91_1">£2.00</text:span>
            </text:p>
        </table:table-cell>
    </table:table-row>
    <table:table-row table:style-name="Row18">
        <table:table-cell table:style-name="Cell69">
            <text:p text:style-name="P92" />
        </table:table-cell>
        <table:table-cell table:style-name="Cell70">
            <text:p text:style-name="P93">
                <text:span text:style-name="T93_1">
                    Theatre
                    <text:s />
                    Bar
                </text:span>
            </text:p>
        </table:table-cell>
        <table:table-cell table:style-name="Cell71">
            <text:p text:style-name="P94">
                <text:span text:style-name="T94_1">
                    Seared
                    <text:s />
                    sea
                    <text:s />
                    Bass,
                    <text:s />
                    Rosti,
                    <text:s />
                    Sofrito
                    <text:s />
                    sauce
                </text:span>
            </text:p>
        </table:table-cell>
        <table:table-cell table:style-name="Cell72">
            <text:p text:style-name="P95">
                <text:span text:style-name="T95_1">£3.50</text:span>
            </text:p>
        </table:table-cell>
    </table:table-row>
    <table:table-row table:style-name="Row19">
        <table:table-cell table:style-name="Cell73">
            <text:p text:style-name="P96" />
        </table:table-cell>
        <table:table-cell table:style-name="Cell74">
            <text:p text:style-name="P97">
                <text:span text:style-name="T97_1">
                    Daily
                    <text:s />
                    Special
                </text:span>
            </text:p>
        </table:table-cell>
        <table:table-cell table:style-name="Cell75">
            <text:p text:style-name="P98">
                <text:span text:style-name="T98_1">
                    Maple
                    <text:s />
                    roast
                    <text:s />
                    Bacon
                    <text:s />
                    loin,Colcannon
                    <text:s />
                    Mash,Grain
                    <text:s />
                    Mustard
                    <text:s />
                    sauce
                </text:span>
            </text:p>
        </table:table-cell>
        <table:table-cell table:style-name="Cell76">
            <text:p text:style-name="P99">
                <text:span text:style-name="T99_1">£3.50</text:span>
            </text:p>
            <text:p text:style-name="P100" />
        </table:table-cell>
    </table:table-row>
    <table:table-row table:style-name="Row20">
        <table:table-cell table:style-name="Cell77">
            <text:p text:style-name="P101" />
        </table:table-cell>
        <table:table-cell table:style-name="Cell78">
            <text:p text:style-name="P102" />
        </table:table-cell>
        <table:table-cell table:style-name="Cell79">
            <text:p text:style-name="P103" />
        </table:table-cell>
        <table:table-cell table:style-name="Cell80">
            <text:p text:style-name="P104" />
        </table:table-cell>
    </table:table-row>
    <table:table-row table:style-name="Row21">
        <table:table-cell table:style-name="Cell81">
            <text:p text:style-name="P105" />
        </table:table-cell>
        <table:table-cell table:style-name="Cell82">
            <text:p text:style-name="P106" />
        </table:table-cell>
        <table:table-cell table:style-name="Cell83">
            <text:p text:style-name="P107" />
        </table:table-cell>
        <table:table-cell table:style-name="Cell84">
            <text:p text:style-name="P108" />
        </table:table-cell>
    </table:table-row>
</table:table>

and how it shows in LibreOffice: https://i.imgur.com/DzmPOGs.png

danielloader commented 4 years ago

Guessing it's related to https://github.com/eea/odfpy/issues/63