cldellow / sqlite-parquet-vtable

A SQLite vtable extension to read Parquet files
Apache License 2.0
267 stars 31 forks source link

Refactored the types to match the most recent Apache Arrow version #39

Open mharju opened 4 years ago

mharju commented 4 years ago

Hi!

Really love what you have done!

I managed to compile this now with OS X (that is still a bit messy so I'll have to refactor that a bit before maybe posting a PR about that) and also updated the code to work with the most recent Apache Arrow library.

cldellow commented 4 years ago

Hey Mikko, thanks for the contribution!

Looks like the PR failed to build in Travis: https://travis-ci.org/cldellow/sqlite-parquet-vtable/builds/611367321

I think that's expected, since the makefile is still linking against 0.9.0.

I tried bumping it to 0.15.1 and building on a clean Ubuntu 16.04 c5n.2xlarge instance. But it failed: https://gist.github.com/cldellow/4634a192c1f08399975a2b730edee044 Looks like something to do with libz.a in parquet-cpp.

Before I spend some time digging into that, maybe you can save me some time :) Did you have to change something locally to get it to build?

It's possible that if you were only building on OS X you didn't hit this issue - maybe your build was using your shared libraries. The Linux build currently builds static libraries so that the versioning story is straightforward for end users.

cldellow commented 4 years ago

I tweaked the Makefile to use 0.15.1 (in the upgrade-deps branch -- see diff).

Now I get new build failures :) You can see them at https://gist.github.com/cldellow/ab127e57e2da4d48b2b517fa9e706296

When you have a moment, can you let me know if you have any ideas on what the issue might be?

mharju commented 4 years ago

Hmm, that seems odd – maybe we need an explicit include for parquet/types.h ..?

mharju commented 4 years ago

It's possible that if you were only building on OS X you didn't hit this issue - maybe your build was using your shared libraries. The Linux build currently builds static libraries so that the versioning story is straightforward for end users.

Yes – I'm actually building a dynamic library for OS X:s needs but I think the error you get on Ubuntu seems to be related to a missing header or something similar – not regards to linking?

cldellow commented 4 years ago

Arg, yes, I think you're right that's it to do with the headers. I had tweaked the includes list to update the location of some of the compression libraries, and that resulted in picking up a different parquet/types.h file.

If I fixed that, then I get:

g++  -c -o parquet_cursor.o /home/ubuntu/sqlite-parquet-vtable/build/linux/../../parquet/parquet_cursor.cc -I /home/ubuntu/sqlite-parquet-vtable/build/linux/../../sqlite -I /home/ubuntu/sqlite-parquet-vtable/build/linux/arrow/cpp/src -I /home/ubuntu/sqlite-parquet-vtable/build/linux/parquet-cpp/arrow_ep-prefix/src/arrow_ep/cpp/src -I /home/ubuntu/sqlite-parquet-vtable/build/linux/parquet-cpp/src -O3 -std=c++11 -Wall -fPIC -g
/home/ubuntu/sqlite-parquet-vtable/build/linux/../../parquet/parquet_cursor.cc: In member function ‘void ParquetCursor::ensureColumn(int)’:
/home/ubuntu/sqlite-parquet-vtable/build/linux/../../parquet/parquet_cursor.cc:780:37: error: invalid conversion from ‘long long int*’ to ‘parquet::TypedScanner<parquet::PhysicalType<(parquet::Type::type)2u> >::T* {aka long int*}’ [-fpermissive]
           s->NextValue(&rv, &wasNull);
                                     ^
In file included from /home/ubuntu/sqlite-parquet-vtable/build/linux/arrow/cpp/src/parquet/api/reader.h:23:0,
                 from /home/ubuntu/sqlite-parquet-vtable/build/linux/../../parquet/parquet_table.h:6,
                 from /home/ubuntu/sqlite-parquet-vtable/build/linux/../../parquet/parquet_cursor.h:5,
                 from /home/ubuntu/sqlite-parquet-vtable/build/linux/../../parquet/parquet_cursor.cc:1:
/home/ubuntu/sqlite-parquet-vtable/build/linux/arrow/cpp/src/parquet/column_scanner.h:145:8: note:   initializing argument 1 of ‘bool parquet::TypedScanner<DType>::NextValue(parquet::TypedScanner<DType>::T*, bool*) [with DType = parquet::PhysicalType<(parquet::Type::type)2u>; parquet::TypedScanner<DType>::T = long int]’
   bool NextValue(T* val, bool* is_null) {
        ^
/home/ubuntu/sqlite-parquet-vtable/build/linux/../../parquet/parquet_cursor.cc:865:46: error: invalid conversion from ‘long long int*’ to ‘parquet::TypedScanner<parquet::PhysicalType<(parquet::Type::type)2u> >::T* {aka long int*}’ [-fpermissive]
         hadValue = s->NextValue(&rv, &wasNull);
                                              ^
In file included from /home/ubuntu/sqlite-parquet-vtable/build/linux/arrow/cpp/src/parquet/api/reader.h:23:0,
                 from /home/ubuntu/sqlite-parquet-vtable/build/linux/../../parquet/parquet_table.h:6,
                 from /home/ubuntu/sqlite-parquet-vtable/build/linux/../../parquet/parquet_cursor.h:5,
                 from /home/ubuntu/sqlite-parquet-vtable/build/linux/../../parquet/parquet_cursor.cc:1:
/home/ubuntu/sqlite-parquet-vtable/build/linux/arrow/cpp/src/parquet/column_scanner.h:145:8: note:   initializing argument 1 of ‘bool parquet::TypedScanner<DType>::NextValue(parquet::TypedScanner<DType>::T*, bool*) [with DType = parquet::PhysicalType<(parquet::Type::type)2u>; parquet::TypedScanner<DType>::T = long int]’
   bool NextValue(T* val, bool* is_null) {
        ^
Makefile:68: recipe for target 'parquet_cursor.o' failed
make: *** [parquet_cursor.o] Error 1

This appears to be related to the long long changes - long term, we can do some platform specific ifdef, I guess. Although perhaps there's a third, more portable option that we should be using?

For now, I just commented that change to make progress on getting it to build on Linux. At last, it built, but when trying to load the resulting shared library to run the tests, one gets:

Error: build/linux/libparquet.so: undefined symbol: _ZN5arrow16PrimitiveBuilderINS_9Int32TypeEE4InitEl

which decodes to:

ubuntu@ip-172-31-88-73:~/sqlite-parquet-vtable$ c++filt -n _ZN5arrow16PrimitiveBuilderINS_9Int32TypeEE4InitEl
arrow::PrimitiveBuilder<arrow::Int32Type>::Init(long)

which makes me think my includes are still messed up, specifically the one providing the template definitions.

This is probably a trivial bug for most people. :) But for me, I haven't programmed professionally in C++ for a long time, so I think I'll need to set aside at least a few hours to make progress. Hopefully in a week or so I'll have time available to make progress.

mharju commented 4 years ago

This appears to be related to the long long changes - long term, we can do some platform specific ifdef, I guess. Although perhaps there's a third, more portable option that we should be using?

Yes, this sounds good. Do you want me to give it a stab or do you have time to do it yourself?

This is probably a trivial bug for most people. :) But for me, I haven't programmed professionally in C++ for a long time, so I think I'll need to set aside at least a few hours to make progress. Hopefully in a week or so I'll have time available to make progress.

I feel you – the last time I've done C++ professionally in the beginning of the millennium so a lot has changed since :D

cldellow commented 4 years ago

Re long long - that'd be great! It may be as simple as using int64_t?

mharju commented 4 years ago

I added long_t typedef'd to int64_t, is that an overkill?