Closed ghost closed 6 years ago
Hi Prashanth,
For the time being we're not adding compile dependencies that are outside of the Cloudera stack. The reason Envelope sticks to Cloudera-provided dependencies is that Cloudera has already tested that they all work together for a common CDH version. If we add third party dependencies then Envelope would need to be responsible for that, which is too much work for us right now. For example, if we added this and then upgraded the Spark version it would hard to know if the third-party spark-avro version we were using was still compatible. When Cloudera adds support for spark-avro to Spark 2.x then we'll add this dependency back in.
Another option for you, if you want to use Avro, is to build your own output where you make your own jar that uses the dependency, then add it to the Spark execution using --jars, and then provide the output class name as the 'type' of the output in the Envelope pipeline configuration.
Thanks for the reply! That works for me.Could you please edit the example to parquet ,that might be helpful.Even the spark 1.6 applications used Data Bricks libraries(https://www.cloudera.com/documentation/enterprise/5-7-x/topics/spark_avro.html#avro
Added changes for FileSystemOutput.java to make it support to write files in Avro format and made Simple File system example to run as per the steps given in readme.md