adjust / parquet_fdw

Parquet foreign data wrapper for PostgreSQL
PostgreSQL License
358 stars 38 forks source link

crash: when read big table of 1000000 rows #22

Open yjhjstz opened 4 years ago

yjhjstz commented 4 years ago

I wrote a c++ test code:

for (int i = 0; i < festate->rowgroups.size(); i++) {
        elog(INFO, " group %d ", i);
        try {
            festate->reader
            ->RowGroup(i)
            ->ReadTable(festate->indices, &festate->table);
        } catch(const std::exception& e) {
            elog(ERROR,
                 "parquet_fdw : failed to read table %d: %s",
                 festate->row_group, e.what());
        }
    }

it also crashed when i == 32768 .

zilder commented 4 years ago

Hi @yjhjstz,

Can you provide a backtrace? How many rowgroups are there in your file. 32K sounds like pretty unreasonable number of row groups. Is it possible that you share your parquet file?

yjhjstz commented 4 years ago

sorry , I misuse StreamWriter::SetMaxRowGroupSize(1000) according to example.

zilder commented 4 years ago

Can you please anyway send the file that caused crash or the code that produces it? I'm going to reproduce this issue and either fix the bug in parquet_fdw if there is one or add an extra check if there is limitation in libarrow.

yjhjstz commented 4 years ago
  1. git clone https://github.com/yjhjstz/parquet_fdw/tree/dev
  2. psql run below:
    
    create extension parquet_fdw;
    create server parquet_srv foreign data wrapper parquet_fdw;

CREATE FOREIGN TABLE test (id int , c1 float4[]) SERVER parquet_srv OPTIONS(filename '/Users/jianghuayang/work/fdw/parquet_fdw/data/test.parquet', sorted 'id');

create or replace function gen_float4_arr(int) returns float4[] as $$
select array_agg((random()*100)::float4) from generate_series(1,$1);
$$ language sql strict;

insert into test select id, gen_float4_arr(64) from generate_series(1,1000000) t(id); select * from test;

yjhjstz commented 4 years ago

by the way , you can reproduce it and join optimize the insert routine .

sdressler commented 3 years ago

Is there any update on this issue? I run into a SEGFAULT when running ANALYZE on a big table.

zilder commented 3 years ago

Hi @sdressler, can you send a backtrace?

sdressler commented 3 years ago

@zilder I can, but figured it out eventually and the schema had mismatching types. I can still get you a backtrace if you want and if it helps to make things more stable.

zilder commented 3 years ago

Yes, that would be helpful. Is it possible that you also provide schemes that you used in parquet and in postgres?

sdressler commented 3 years ago

I am going to open a new bug report.