Open ludnic opened 1 year ago
Ludovico
Sorry, it's been a while. Interesting to see all the additions that have been made in the interim. With all the changes it's probably better to fork the current and continue development on your github. Anyway, let me get back up to speed - I plan to review in the coming days.
On Sat, Sep 30, 2023 at 10:20 AM Ludovico Nicotina @.***> wrote:
@freevryheid https://github.com/freevryheid can I pick your brains on this issue on the table functions.
The general idea is that you register a callback function which can be executed through a duckdb query.
I've written some of the interfaces and helper functions and tried to reproduce what's in this test: https://github.com/duckdb/duckdb/blob/v0.8.1/test/api/capi/capi_table_functions.cpp#L89-L122 in the test_table_f module.
I can't pinpoint the issue, the callback seem to work fine but when trying to retrieve the information that has been registered in the initialisation the values are not correct. I've added some prints that should make it clear what the issue is.
— Reply to this email directly, view it on GitHub https://github.com/freevryheid/duckdb/issues/59, or unsubscribe https://github.com/notifications/unsubscribe-auth/AACFSLTIJURACNBU5TCMWF3X5A2E7ANCNFSM6AAAAAA5NT4QDA . You are receiving this because you were mentioned.Message ID: @.***>
-- Andre
Hey Andre, sure thing no worries. I'm keeping a fork in sync with this in case you don't want to have this repository anymore. I'm taking it as a mix of practice C interop on a real project and keeping an eye on duckdb for analytics which is quite interesting, so I'm happy to continue contributing even here in the free time.
Thanks for taking the time anyway, there is a PR with the table function stuff btw in here: https://github.com/freevryheid/duckdb/pull/58
L.
Catching up. Looks to me you did a good job of prototyping the required init, bind and user defined functions. This is leading edge, and other than the test example, I've not been able to find any other examples of how these are applied in c. The python examples provided on the duckdb site really simplify the process of querying udfs. Still no solution as yet.
Made some comments in the src regarding the use of derived type vs type(c_ptr) for the table function parameters. Perhaps we should post an issue on duckdb github calling for consistency in the way parameters are defined. I would prefer these as structs rather than pointers.
I made these changes on my side converting all derived types to type(c_ptr) but the test still fails:
Starting table-functions-test ... (1/1)
Step 1: Register a table function using callbacks for bind, init and function.
Starting registering table function: my_function
Done registering table function: my_function
Step 2: Run a query on my_function which should execute the callback functions.
Start my_bind
Retrieved size value from param: 1
Set the bind size to 1
Done my_bind
Start my_init
Set the init pos to 0
Done my_init
Starting execution of my_function
Retrieved bind size: 93849529367291
Retrieved init pos: 93849528064829
Size of output: 0
1 2048
At line 119 of file test/test_table_f.f90
Fortran runtime error: Index '1' of dimension 1 of array 'v' above upper bound of 0
The retrieved bind size and init pos look suspect. Also that array of pointers (v). I noticed a similar approach used in test_data_chunk.f90
. Can we not just define v as an int32 integer array?
Shouldn't the final check in the test_table_functions subroutine be duckdb_value_int32 vs int64.
I'll continue to explore on my side.
Made some comments in the src regarding the use of derived type vs type(c_ptr) for the table function parameters. Perhaps we should post an issue on duckdb github calling for consistency in the way parameters are defined. I would prefer these as structs rather than pointers.
I made these changes on my side converting all derived types to type(c_ptr) but the test still fails:
Starting table-functions-test ... (1/1) Step 1: Register a table function using callbacks for bind, init and function. Starting registering table function: my_function Done registering table function: my_function Step 2: Run a query on my_function which should execute the callback functions. Start my_bind Retrieved size value from param: 1 Set the bind size to 1 Done my_bind Start my_init Set the init pos to 0 Done my_init Starting execution of my_function Retrieved bind size: 93849529367291 Retrieved init pos: 93849528064829 Size of output: 0 1 2048 At line 119 of file test/test_table_f.f90 Fortran runtime error: Index '1' of dimension 1 of array 'v' above upper bound of 0
The retrieved bind size and init pos look suspect. Also that array of pointers (v). I noticed a similar approach used in
test_data_chunk.f90
. Can we not just define v as an int32 integer array?Shouldn't the final check in the test_table_functions subroutine be duckdb_value_int32 vs int64.
I'll continue to explore on my side.
thanks for checking. I have seen the comments and updated the PR accordingly. I agree with you they should be c_ptr correctly. And yes, it looks much clearer on the fortran side with the derived types.
The values printed for pos and size seem to me in the range of memory addresses, which makes me think there is something wrong somewhere. At least on my machine when I print a couple of loc
and c_loc
they are in a similar range. But without seeing what duckdb_bind_set_bind_data
actually do it's hard to say. Maybe that's where I should look. Will keep looking too.
Here my additional log:
Step 1: Register a table function using callbacks for bind, init and function.
Starting registering table function: my_function
Done registering table function: my_function
Step 2: Run a query on my_function which should execute the callback functions.
Start my_bind
Retrieved size value from param: 1
value of info: 281474063442944
pointer to my_bind_data: 281473032416944
loc of my_bind_data%size: 281473032416944
Set the bind size to 1
Done my_bind
Start my_init
Set the init pos to 0
Done my_init
Starting execution of my_function
Retrieved bind size: 281473648195848
Retrieved init pos: 281408201712438
Size of output: 0
1 2048
At line 109 of file test/test_table_f.f90
Fortran runtime error: Index '1' of dimension 1 of array 'v' above upper bound of 0
The retrieved bind size and init pos look suspect. Also that array of pointers (v). I noticed a similar approach used in
test_data_chunk.f90
. Can we not just define v as an int32 integer array?
This is a good point, I have tried a few different things on the data_chunk test and eventually this seemed to work so I was repeating here. But maybe I'm missing something, but my understanding was that duckdb_vector_get_data
returns a c_ptr
and we need to cast it to the data type we need. So if I remove the pointer attribute for v
I get:
test/test_table_f.f90:101:26:
101 | call c_f_pointer(tmp, v, [duckdb_data_chunk_get_size(output)])
| 1
Error: Argument FPTR at (1) to C_F_POINTER must be a pointer
Shouldn't the final check in the test_table_functions subroutine be duckdb_value_int32 vs int64.
I think you are right. I just copied that from the cpp test here: https://github.com/duckdb/duckdb/blob/0d84ccf478578278b2d1168675b8b93c60f78a5e/test/api/capi/capi_table_functions.cpp#L89-L103
But I never got there yet to fix it. If we get to make this work I'll go on with the rest of the test.
Still no solution but I did notice that the functions calls to my_bind, my_init and my_function pass the info
parameter by value. Thus my_bind and my_init calling set_bind_data(info, ...) and set_init_data(info, ...) are not working with info, but separate copies of it. When called in my_function, get_bind_data(info) and get_init_data(info) are again working with a copy of info
that in no way relates to info
in any of the functions called.
Can we call info
by reference?
Can we make it global perhaps?
Interesting. I've done a super quick attempt and got a memory error, but have not checked it enough to be sure. I will play with your idea a bit more.
To clarify, you are thinking on how we are handling info on the fortran side, right? Or are you thinking these interfaces should rather get a reference to info:
! DUCKDB_API void duckdb_bind_set_bind_data(duckdb_bind_info info, void *bind_data, duckdb_delete_callback_t destroy);
Made some updates. Still not passing the test but wanted to save my progress. Still not able to get data from the init and bind structs in my_function but getting there. I've been experimenting with another c library that also uses callbacks, which was the basis for most of the edits made in the latest commit:
@freevryheid can I pick your brains on this issue on the table functions.
The general idea is that you register a callback function which can be executed through a duckdb query.
I've written some of the interfaces and helper functions and tried to reproduce what's in this test: https://github.com/duckdb/duckdb/blob/v0.8.1/test/api/capi/capi_table_functions.cpp#L89-L122 in the test_table_f module.
I can't pinpoint the issue, the callback seem to work fine but when trying to retrieve the information that has been registered in the initialisation the values are not correct. I've added some prints that should make it clear what the issue is.