Open Gabriel-p opened 1 year ago
I agree that the ability to read Parquet files would be nice. It's probably worth investigating whether using something like pyarrow
directly has any sort of performance gains over pandas.read_parquet
, but if you're interested in a very minimal example of a Parquet data loader, you can add the snippet below (which requires pyarrow
) to your glue config file, which should allow you to load at least basic Parquet files:
from glue.config import data_factory
from glue.core.data_factories.helpers import has_extension
from glue.core.data_factories.pandas import panda_process
from pandas import read_parquet
@data_factory(label="Parquet file", identifier=has_extension("parquet"))
def pandas_read_parquet(path, engine="pyarrow", **kwargs):
df = read_parquet(path, engine=engine)
return panda_process(df)
Thank you! It worked perfectly, I just removed the engine
specification since my files open just fine with whatever pandas
does by default
Is your feature request related to a problem? Please describe it: Pandas'
parquet
files are not loadedDescribe the solution you'd like: Load
parquet
files