freevryheid / duckdb

fortran bindings to duckdb c api
MIT License
9 stars 2 forks source link

Failing table function test #59

Open ludnic opened 1 year ago

ludnic commented 1 year ago

@freevryheid can I pick your brains on this issue on the table functions.

The general idea is that you register a callback function which can be executed through a duckdb query.

I've written some of the interfaces and helper functions and tried to reproduce what's in this test: https://github.com/duckdb/duckdb/blob/v0.8.1/test/api/capi/capi_table_functions.cpp#L89-L122 in the test_table_f module.

I can't pinpoint the issue, the callback seem to work fine but when trying to retrieve the information that has been registered in the initialisation the values are not correct. I've added some prints that should make it clear what the issue is.

freevryheid commented 1 year ago

Ludovico

Sorry, it's been a while. Interesting to see all the additions that have been made in the interim. With all the changes it's probably better to fork the current and continue development on your github. Anyway, let me get back up to speed - I plan to review in the coming days.

On Sat, Sep 30, 2023 at 10:20 AM Ludovico Nicotina @.***> wrote:

@freevryheid https://github.com/freevryheid can I pick your brains on this issue on the table functions.

The general idea is that you register a callback function which can be executed through a duckdb query.

I've written some of the interfaces and helper functions and tried to reproduce what's in this test: https://github.com/duckdb/duckdb/blob/v0.8.1/test/api/capi/capi_table_functions.cpp#L89-L122 in the test_table_f module.

I can't pinpoint the issue, the callback seem to work fine but when trying to retrieve the information that has been registered in the initialisation the values are not correct. I've added some prints that should make it clear what the issue is.

— Reply to this email directly, view it on GitHub https://github.com/freevryheid/duckdb/issues/59, or unsubscribe https://github.com/notifications/unsubscribe-auth/AACFSLTIJURACNBU5TCMWF3X5A2E7ANCNFSM6AAAAAA5NT4QDA . You are receiving this because you were mentioned.Message ID: @.***>

-- Andre

ludnic commented 1 year ago

Hey Andre, sure thing no worries. I'm keeping a fork in sync with this in case you don't want to have this repository anymore. I'm taking it as a mix of practice C interop on a real project and keeping an eye on duckdb for analytics which is quite interesting, so I'm happy to continue contributing even here in the free time.

Thanks for taking the time anyway, there is a PR with the table function stuff btw in here: https://github.com/freevryheid/duckdb/pull/58

L.

freevryheid commented 1 year ago

Catching up. Looks to me you did a good job of prototyping the required init, bind and user defined functions. This is leading edge, and other than the test example, I've not been able to find any other examples of how these are applied in c. The python examples provided on the duckdb site really simplify the process of querying udfs. Still no solution as yet.

freevryheid commented 1 year ago

Made some comments in the src regarding the use of derived type vs type(c_ptr) for the table function parameters. Perhaps we should post an issue on duckdb github calling for consistency in the way parameters are defined. I would prefer these as structs rather than pointers.

I made these changes on my side converting all derived types to type(c_ptr) but the test still fails:

  Starting table-functions-test ... (1/1)
 Step 1: Register a table function using callbacks for bind, init and function.
 Starting registering table function: my_function
 Done registering table function: my_function
 Step 2: Run a query on my_function which should execute the callback functions.
 Start my_bind
   Retrieved size value from param:                     1
   Set the bind size to                     1
 Done my_bind
 Start my_init
   Set the init pos to                     0
 Done my_init
 Starting execution of my_function
   Retrieved bind size:        93849529367291
   Retrieved init pos:        93849528064829
   Size of output:            0
           1        2048
At line 119 of file test/test_table_f.f90
Fortran runtime error: Index '1' of dimension 1 of array 'v' above upper bound of 0

The retrieved bind size and init pos look suspect. Also that array of pointers (v). I noticed a similar approach used in test_data_chunk.f90. Can we not just define v as an int32 integer array?

Shouldn't the final check in the test_table_functions subroutine be duckdb_value_int32 vs int64.

I'll continue to explore on my side.

ludnic commented 1 year ago

Made some comments in the src regarding the use of derived type vs type(c_ptr) for the table function parameters. Perhaps we should post an issue on duckdb github calling for consistency in the way parameters are defined. I would prefer these as structs rather than pointers.

I made these changes on my side converting all derived types to type(c_ptr) but the test still fails:

  Starting table-functions-test ... (1/1)
 Step 1: Register a table function using callbacks for bind, init and function.
 Starting registering table function: my_function
 Done registering table function: my_function
 Step 2: Run a query on my_function which should execute the callback functions.
 Start my_bind
   Retrieved size value from param:                     1
   Set the bind size to                     1
 Done my_bind
 Start my_init
   Set the init pos to                     0
 Done my_init
 Starting execution of my_function
   Retrieved bind size:        93849529367291
   Retrieved init pos:        93849528064829
   Size of output:            0
           1        2048
At line 119 of file test/test_table_f.f90
Fortran runtime error: Index '1' of dimension 1 of array 'v' above upper bound of 0

The retrieved bind size and init pos look suspect. Also that array of pointers (v). I noticed a similar approach used in test_data_chunk.f90. Can we not just define v as an int32 integer array?

Shouldn't the final check in the test_table_functions subroutine be duckdb_value_int32 vs int64.

I'll continue to explore on my side.

thanks for checking. I have seen the comments and updated the PR accordingly. I agree with you they should be c_ptr correctly. And yes, it looks much clearer on the fortran side with the derived types.

The values printed for pos and size seem to me in the range of memory addresses, which makes me think there is something wrong somewhere. At least on my machine when I print a couple of loc and c_loc they are in a similar range. But without seeing what duckdb_bind_set_bind_data actually do it's hard to say. Maybe that's where I should look. Will keep looking too.

Here my additional log:

 Step 1: Register a table function using callbacks for bind, init and function.
 Starting registering table function: my_function
 Done registering table function: my_function
 Step 2: Run a query on my_function which should execute the callback functions.
 Start my_bind
   Retrieved size value from param:                     1
 value of info:      281474063442944
 pointer to my_bind_data:      281473032416944
 loc of my_bind_data%size:      281473032416944
   Set the bind size to                     1
 Done my_bind
 Start my_init
   Set the init pos to                     0
 Done my_init
 Starting execution of my_function
   Retrieved bind size:       281473648195848
   Retrieved init pos:       281408201712438
   Size of output:            0
           1        2048
At line 109 of file test/test_table_f.f90
Fortran runtime error: Index '1' of dimension 1 of array 'v' above upper bound of 0
ludnic commented 1 year ago

The retrieved bind size and init pos look suspect. Also that array of pointers (v). I noticed a similar approach used in test_data_chunk.f90. Can we not just define v as an int32 integer array?

This is a good point, I have tried a few different things on the data_chunk test and eventually this seemed to work so I was repeating here. But maybe I'm missing something, but my understanding was that duckdb_vector_get_data returns a c_ptr and we need to cast it to the data type we need. So if I remove the pointer attribute for v I get:

test/test_table_f.f90:101:26:

  101 |     call c_f_pointer(tmp, v, [duckdb_data_chunk_get_size(output)])
      |                          1
Error: Argument FPTR at (1) to C_F_POINTER must be a pointer

Shouldn't the final check in the test_table_functions subroutine be duckdb_value_int32 vs int64.

I think you are right. I just copied that from the cpp test here: https://github.com/duckdb/duckdb/blob/0d84ccf478578278b2d1168675b8b93c60f78a5e/test/api/capi/capi_table_functions.cpp#L89-L103

But I never got there yet to fix it. If we get to make this work I'll go on with the rest of the test.

freevryheid commented 1 year ago

Still no solution but I did notice that the functions calls to my_bind, my_init and my_function pass the info parameter by value. Thus my_bind and my_init calling set_bind_data(info, ...) and set_init_data(info, ...) are not working with info, but separate copies of it. When called in my_function, get_bind_data(info) and get_init_data(info) are again working with a copy of info that in no way relates to info in any of the functions called.

Can we call info by reference? Can we make it global perhaps?

ludnic commented 1 year ago

Interesting. I've done a super quick attempt and got a memory error, but have not checked it enough to be sure. I will play with your idea a bit more.

To clarify, you are thinking on how we are handling info on the fortran side, right? Or are you thinking these interfaces should rather get a reference to info:

! DUCKDB_API void duckdb_bind_set_bind_data(duckdb_bind_info info, void *bind_data, duckdb_delete_callback_t destroy);
freevryheid commented 1 year ago

Made some updates. Still not passing the test but wanted to save my progress. Still not able to get data from the init and bind structs in my_function but getting there. I've been experimenting with another c library that also uses callbacks, which was the basis for most of the edits made in the latest commit:

https://github.com/freevryheid/flibcsv