ZJONSSON / parquetjs

fully asynchronous, pure JavaScript implementation of the Parquet file format
MIT License
34 stars 61 forks source link

Running into heap out of memory issues reading a 25 mb parquet files. #57

Open PulkitChadhaAdobe opened 3 years ago

PulkitChadhaAdobe commented 3 years ago

Hi All.

I intend to read parquet files up to 1GB in size but I'm hitting heap out of memory reading a 25 MB parquet files with 6 columns. I'm not able to find a read buffer method. Is there any other alternative to be able to read large Parquet files?

Thanks, Pulkit

SeanBarry commented 3 years ago

Hi there, did you ever manage to solve this issue? I'm facing a similar issue and struggling to work out how to consume larger files.

PulkitChadhaAdobe commented 3 years ago

Hey Sean, we have not gotten a workaround yet. We ended up writing a Python function using pyarrow to convert parquet into JSON. I don't like the duck-tape solution but atleast it unblocked our project. I'm interested to know what you have learned while trying to fix this issue at your end.

SeanBarry commented 3 years ago

Hey, thanks for the reply. Ha, funnily enough we are doing the exact same thing. I'm spinning up a python script using node's 'child_process' module to ingest the parquet file. It's considerably faster!