fingltd / 4mc

4mc - splittable lz4 and zstd in hadoop/spark/flink
Other
108 stars 37 forks source link

could you provide a 4mc example for flink #36

Open wangjian2019 opened 5 years ago

wangjian2019 commented 5 years ago

could you provide a 4mc example for flink when flink read 4mc data on HDFS files?

carlomedas commented 5 years ago

I don't have a Flink example ready to go but can give you some input. We are going to use it soon in our data chain and when we have a decent example will also add in examples folder.

// 1) create hadoop config and set your hadoop host/stuff Configuration hadoopConfig = new Configuration(); hadoopConfig.set("fs.defaultFS", "hdfs://yourHdfsHost:8020"); hadoopConfig.set("io.compression.codecs", "...."); // make sure to set codecs

// 2) get job conf and configure 4mc for your proto message Job jobConf = Job.getInstance(hadoopConfig); FourMcEbProtoInputFormat.setInputFormatClass(YOURMSG.YourProtoMessage.class, jobConf);

// 3) create input from hdfs DataSet<Tuple2<LongWritable, ProtobufWritable>> input = env.readHadoopFile(new FourMcEbProtoInputFormat(), LongWritable.class, ProtobufWritable.class, "hdfs://path_to_your_file.4mc", jobConf);

// 4) add more in union or selecting multiple files at once

// 5) use the data set