adjust / parquet_fdw

Parquet foreign data wrapper for PostgreSQL
PostgreSQL License
333 stars 37 forks source link

postgres crush when there are 0 records on parquet file #31

Closed amichaia closed 3 years ago

amichaia commented 3 years ago

Hi,

i tried to write 0 records with pandas like this: import pandas as pd d = {'col1': [], 'col2': [] df = pd.DataFrame(data=d) df.to_parquet('/test_data/data.parquet1_debug', compression=None)

when reading back with pandas, all works well and it says that its empty,

however, postgres crush.

i have looked inside the fdw code specifically in the populate_slot() function and saw that the slot container stays uninitialized and later is marked as valid by ExecStoreVirtualTuple() that is called from SingleFileExecutionState::next().

i did a local fix for myself that works for now, but i really dont know if what i did is ok. (i basically return false from populate_slot() if the slot attributes were not initialized )

Can you guys please take a look. i think that reproducing is really easy here.

zilder commented 3 years ago

Hi @amichaia,

thanks a lot for the report! I'll take a look and let you know when it's fixed.

zilder commented 3 years ago

Hi @amichaia,

I pushed a new branch fix_31 which should fix the bug. Can you please check whether it works for you?

amichaia commented 3 years ago

Great thanks! i'll give it a try and update you ASAP :)

maozguttman commented 3 years ago

I'm working with Amichai. This bug is fixed. But I found a new one. Will open a new issue.