fumiama / go-docx

One of the most functional libraries to partially read and write .docx files (a.k.a. Microsoft Word documents or ECMA-376 Office Open XML) in Go.
GNU Affero General Public License v3.0
109 stars 14 forks source link

Document Parse fails if Table has width with decimal value #9

Closed junwen-k closed 1 year ago

junwen-k commented 1 year ago

Hi, thank you for writing up this library.

Currently, I am facing issue trying to Parse an existing docx file which consists of table with decimal point width as follows:

...
<w:tblW w:w="11116.8" w:type="dxa" />
...
doc, err := docx.Parse(readFile, size)
if err != nil {
    panic(err)
}

And got error output:

strconv.ParseInt: parsing "11116.8": invalid syntax

Suspect is due to this line

https://github.com/fumiama/go-docx/blob/master/structtable.go#L280


I am no expert in working with docx file. Should the library supports decimal points value by using strconv.ParseFloat instead? This might also apply to other components, not just table width.

Thank you!

fumiama commented 1 year ago

Well, the float point width may not a standard behavior of docx file since the unit twips is quite small. We could fix the crash by using strconv.ParseFloat as the backend and convert the value to integer to fill in our struct field.

fumiama commented 1 year ago

You can try again this time.

junwen-k commented 1 year ago

@fumiama Thanks for the update. Apparently, there is another similar issue currently exists for

...
<w:gridCol w:w="1088.2762105263155" />
...
strconv.ParseInt: parsing "1088.2762105263155": invalid syntax

Should we add the float fallback for all these components as well?

https://github.com/fumiama/go-docx/blob/master/structtable.go#L393 https://github.com/fumiama/go-docx/blob/master/structtable.go#L495 https://github.com/fumiama/go-docx/blob/master/structtable.go#L618

For your information, this docx file is actually written using Google Docs.

fumiama commented 1 year ago

Should we add the float fallback for all these components as well?

Probably as what you say.

junwen-k commented 1 year ago

I've added a PR for float value fallback. You may take a look :)

fumiama commented 1 year ago

OK. Does this float point value only exist in table?

junwen-k commented 1 year ago

@fumiama Unfortunately I could not provide an accurate answer regarding to this question. However it seems like decimal point width do exists as the docx that I've created using Google Docs have it.

I would expect that Image as well as other components might have similar issue as well.

Maybe it is best to have a utility that parses numeric value using strconv.parseInt and fallback to strconv.parseFloat if it fails. 🤔

fumiama commented 1 year ago

Thanks for your advice. I'm consider using a regex to automatically add fallback to all these strconv.parseInts.

fumiama commented 1 year ago

Done.