Mule SDK connector that provides the ability to read Parquet files into JSON or write Parquet files from Avro data.
Apache Parquet is a columnar data storage format, which provides a way to store tabular data column wise. Columns of same date-time are stored together as rows in Parquet format, so as to offer better storage, compression and data retrieval.
Using Parquet format has two advantages
mvn clean install
pom.xml
file<dependency>
<groupId>com.dejim</groupId>
<artifactId>parquet</artifactId>
<version>1.0.24-SNAPSHOT</version>
<classifier>mule-plugin</classifier>
</dependency>
You can report new issues at this link https://github.com/djuang1/parquet/issues.
This operation allows you to read a parquet file from an InputStream (e.g. #[payload]) Data can be coming from S3 or other connector that provides Streaming instead of needing to read it from the file system. It returns the data back in JSON format.
This operation allows you to write a parquet file to an InputStream (e.g. #[payload]). Instead of writing to disk, you can output the data directly to S3 or other connector that provides Streaming capabilities.
This operation allows you to read a parquet file from a local file system. It returns the data back in JSON format.
Writing data to a parquet file isn't a straightforward process. It requires a schema that needs to be defined around the data. This operation allows you leverage Avro format support in MuleSoft to format the data using DataWeave before writing it to a parquet file.
application/avro
Author: Dejim Juang - dejimj@gmail.com
Last Update: October 22, 2022