aws / aws-sdk-pandas

pandas on AWS - Easy integration with Athena, Glue, Redshift, Timestream, Neptune, OpenSearch, QuickSight, Chime, CloudWatchLogs, DynamoDB, EMR, SecretManager, PostgreSQL, MySQL, SQLServer and S3 (Parquet, CSV, JSON and EXCEL).
https://aws-sdk-pandas.readthedocs.io
Apache License 2.0
3.94k stars 701 forks source link

fix: read parquet file in chunked mode per row group #3016

Closed FredericKayser closed 4 days ago

FredericKayser commented 2 weeks ago

Feature or Bugfix

Detail

Fixes the read_parquet function in chunked mode. Currently a whole file gets loaded into the memory instead of row groups. This can lead to memory issues.

Relates

By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.

malachi-constant commented 2 weeks ago

AWS CodeBuild CI Report

Powered by github-codebuild-logs, available on the AWS Serverless Application Repository

malachi-constant commented 2 weeks ago

AWS CodeBuild CI Report

Powered by github-codebuild-logs, available on the AWS Serverless Application Repository

malachi-constant commented 2 weeks ago

AWS CodeBuild CI Report

Powered by github-codebuild-logs, available on the AWS Serverless Application Repository

malachi-constant commented 2 weeks ago

AWS CodeBuild CI Report

Powered by github-codebuild-logs, available on the AWS Serverless Application Repository

malachi-constant commented 2 weeks ago

AWS CodeBuild CI Report

Powered by github-codebuild-logs, available on the AWS Serverless Application Repository

malachi-constant commented 2 weeks ago

AWS CodeBuild CI Report

Powered by github-codebuild-logs, available on the AWS Serverless Application Repository

malachi-constant commented 1 week ago

AWS CodeBuild CI Report

Powered by github-codebuild-logs, available on the AWS Serverless Application Repository

malachi-constant commented 1 week ago

AWS CodeBuild CI Report

Powered by github-codebuild-logs, available on the AWS Serverless Application Repository

malachi-constant commented 4 days ago

AWS CodeBuild CI Report

Powered by github-codebuild-logs, available on the AWS Serverless Application Repository

malachi-constant commented 4 days ago

AWS CodeBuild CI Report

Powered by github-codebuild-logs, available on the AWS Serverless Application Repository