Open Paxa opened 5 years ago
Do you know how to install specify minizip path in windows? Travis builds works only for windows and mac now
You can tell XLSX I/O buid to use a specific Minizip install location with the CMake parameter -DMINIZIP_DIR:PATH=
I have started reviewing some of your code, and already have several remarks:
I not sure about licenses, will it be better if we download utarray.h and uthash.h in a make file? utarray.h and uthash.h is only header files with macros, not sure if we can compile it as a library Do you have any suggestions for alternatives?
I don't think we gonna have many styles, so performance not so critical. And in our use case we just need append and get, may be implement own linked list for that
What should we do with malloc? check if (res == NULL) { return NULL }
?
I deleted files inside build folder, is it ok? (I don't know what those files was for) btw I made all these changes to create ruby library https://github.com/Paxa/fast_excel_reader according to my benchmark already got 9-15 times improvement
I usually do:
if ((s = (char*)malloc(wantedsize)) == NULL)
return NULL; //or some error code
that way you don't assume malloc worked causing the application to crash or even to allow hacker to exploit this.
The build folder contains the Code::Blocks project files, which I use to develop and debug on Windows. Please don't delete it.
uthash is a package in most package systems (e.g. uthash-dev apt package on Debian Linux). I would just add a CMake parameter to locate it for systems that don't have it in their package manager (like Windows). But can you tell me why exactly we need this? I'm not keen to add dependancies...
Question: Do your changes look at the type on cell-level, or also to colomn or row settings in case it's not defined on cell level?
Will try to make malloc as suggested and bring back build/
folder
But can you tell me why exactly we need this? I'm not keen to add dependancies...
it was in code uploaded in #12 and I keep it. It is used to find relations between cell and number format:
Example styles.xml:
<?xml version="1.0" encoding="UTF-8" standalone="yes"?>
<styleSheet xmlns="http://schemas.openxmlformats.org/spreadsheetml/2006/main">
<numFmts count="2">
<numFmt numFmtId="164" formatCode="#,##0.00"/>
<numFmt numFmtId="165" formatCode="[$-409]m/d/yy h:mm AM/PM;@"/>
</numFmts>
...
<cellStyleXfs count="1">
<xf numFmtId="0" fontId="0" fillId="0" borderId="0"/>
</cellStyleXfs>
<cellXfs count="3">
<xf numFmtId="0" fontId="0" fillId="0" borderId="0" xfId="0"/>
<xf numFmtId="164" fontId="0" fillId="0" borderId="0" xfId="0" applyNumberFormat="1"/>
<xf numFmtId="165" fontId="0" fillId="0" borderId="0" xfId="0" applyNumberFormat="1"/>
</cellXfs>
<cellStyles count="1">
<cellStyle name="Normal" xfId="0" builtinId="0"/>
</cellStyles>
<dxfs count="0"/>
<tableStyles count="0" defaultTableStyle="TableStyleMedium9" defaultPivotStyle="PivotStyleLight16"/>
</styleSheet>
Example sheet.xml
<?xml version="1.0" encoding="UTF-8" standalone="yes"?>
<worksheet xmlns="http://schemas.openxmlformats.org/spreadsheetml/2006/main" xmlns:r="http://schemas.openxmlformats.org/officeDocument/2006/relationships">
<dimension ref="A1:G5"/>
...
<cols>
<col min="2" max="2" width="20.7109375" style="1" customWidth="1"/>
<col min="3" max="3" width="20.7109375" style="2" customWidth="1"/>
</cols>
<sheetData>
<row r="1" spans="1:7">
<c r="A1" t="s">
<v>0</v>
</c>
<c r="B1" s="1">
<v>1234</v>
</c>
<c r="C1" s="2">
<v>43537.51777806081</v>
</c>
<c r="D1">
<v>1234</v>
</c>
<c r="E1" t="b">
<v>1</v>
</c>
<c r="F1">
<f>(B1 * 2)</f>
<v>0</v>
</c>
<c r="G1" t="s">
<v>1</v>
</c>
</row>
<row r="5" s="2" customFormat="1" ht="20" customHeight="1"/>
</sheetData>
<hyperlinks>
<hyperlink ref="G1" r:id="rId1"/>
</hyperlinks>
<pageMargins left="0.7" right="0.7" top="0.75" bottom="0.75" header="0.3" footer="0.3"/>
</worksheet>
In example files
<c r="C1" s="2">
means cell have style = 2<xf numFmtId="165" fontId="0" fillId="0" borderId="0" xfId="0" applyNumberFormat="1"/>
, it reference to number format 165<numFmt numFmtId="165" formatCode="[$-409]m/d/yy h:mm AM/PM;@"/>
From format code we can recognize it's a date, not just number. This code also consider builtin number formats 14 - 22.
If <numFmts ...>
and <cellXfs ...>
will always have count
attribute then we can read it and allocate simple array of int for style-format relations and struct for number formats, or event array of style_id->number format struct (should be fast and simple).
For now only on cell-level, I tried to create file with row format and column format, but when extract xml files, and each cell will have style reference e.g s="2"
If you have example of file with column format or row format - I will try to make it work
Btw where do you find format specification?
Styles are numbered. That's why I was wondering why hashing was used. A (sparse) array structure would do for this. But I guess it will do for now. Worst case I witch the license from MIT to BSD.
How much of the API is modified? I'm assuming your changes are not very backwards compatible?
I changed code to use arrays of structs, but not sure if I implemented it correctly, especially memory allocation and memory releasing. So not using utarray and uthash anymore
How much of the API is modified? I'm assuming your changes are not very backwards compatible?
Should be totally backwards compatible, changed xlsxioread_sheet_next_cell
to xlsxioread_sheet_next_cell_struct
and added xlsxioread_sheet_next_cell
that returns just string
What is your preferences for variable names? camelCase or snake_case? e.g. handle->numberFormats or handle->number_formats?
Wow, you're quick! I avoid camelCase in pure C. I stick with lower case and use underscore to separate things that are separate (e.g. get_numberformat).
lower case and use underscore to separate things that are separate (e.g. get_numberformat).
done
may be you know how to install minizip on windows? or is windows build is important at all?
For me MinGW in Windows is my primary development platform, but I also make sure it builds on Debian Linux and macOS. MiniZip 1.2.8 (downloaded from http://www.gaia-gis.it/gaia-sins/dataseltzer-sources/) builds fine on Windows with MinGW using CMake but for me it needed this patch to avoid WinRT specifics:
patch -ulbf iowin32.c << EOF
--- iowin32.c 2013-04-29 00:57:12 +0200
+++ iowin32.c 2014-08-09 23:11:20 +0200
@@ -28,3 +28,3 @@
-#if defined(WINAPI_FAMILY_PARTITION) && (!(defined(IOWIN32_USING_WINRT_API)))
+#if defined(WINAPI_FAMILY_PARTITION) && (!(defined(IOWIN32_USING_WINRT_API))) && !defined(__MINGW32__)
#if WINAPI_FAMILY_PARTITION(WINAPI_PARTITION_APP)
EOF
If you need binaries I could send them to you if you like.
Hi, all. (Excuse my poorly-bad English). I'm trying to use this PR in my project and facing with problem: Date-format cells is recognised as "none", not "date".
Do you have example of a file?
Thank you @AlexanderBurtasov for founding a bug. I fixed it
17596_reduced.xlsx Hi again. PR completely falls while loading this file. Falls while calling xlsxioread_open.
@AlexanderBurtasov fixed
I took suggested implementation from https://github.com/brechtsanders/xlsxio/issues/12 and add new method to keep existing API same
Fixes https://github.com/brechtsanders/xlsxio/issues/12
Now I found similar fork https://github.com/matejd/xlsxio/commit/a0b6a0b95e16ffc82f158eb3f64a268c8fee8bf5
Travis status: (https://travis-ci.org/Paxa/xlsxio)
How to use
xlsxioread_sheet_next_cell_struct
:How to use
xlsxioread_debug_internals
: it prints internal structures with extracted number formats (just for debugging)Output: