aws / sagemaker-spark-container

The SageMaker Spark Container is a Docker image used to run data processing workloads with the Spark framework on Amazon SageMaker.
Apache License 2.0
36 stars 74 forks source link

PyArrow and PySpark pandas support #136

Open mjost5v opened 10 months ago

mjost5v commented 10 months ago

Reading parquet files with PySpark and pandas is common. The Pipfile does not include pandas and pyarrow for reading parquet files and executing pandas_udfs