astronomer / astro-sdk

Astro SDK allows rapid and clean development of {Extract, Load, Transform} workflows using Python and SQL, powered by Apache Airflow.
https://astro-sdk-python.rtfd.io/
Apache License 2.0
350 stars 43 forks source link

load_file() - Complete Parquet file is loaded in Memory #1059

Open utkarsharma2 opened 2 years ago

utkarsharma2 commented 2 years ago

Describe the bug When inferring schema complete Parquet is loaded in memory. https://github.com/astronomer/astro-sdk/blob/e4111397387e43ba7c8d2c0f5645376c466d271a/python-sdk/src/astro/files/types/parquet.py#L45

Version

Expected behavior Research - if there is a better way to read schema information from parquet instead of not loading them in memory.

Additional context Add any other context about the problem here.

sunank200 commented 2 years ago

Removing this from 1.2.1 as per discussion

phanikumv commented 1 year ago

Refer to how PyArrow implemented this.