Open evarga opened 8 years ago
The following code can be improved by better leveraging SQL:
// Calculate statistics based on the content size. Tuple4<Long, Long, Long, Long> contentSizeStats = sqlContext.sql("SELECT SUM(contentSize), COUNT(*), MIN(contentSize), MAX(contentSize) FROM logs") .map(row -> new Tuple4<>(row.getLong(0), row.getLong(1), row.getLong(2), row.getLong(3))) .first(); System.out.println(String.format("Content Size Avg: %s, Min: %s, Max: %s", contentSizeStats._1() / contentSizeStats._2(), contentSizeStats._3(), contentSizeStats._4()));
Namely, SQL already suppports calculating an average via the AVG function. Therefore, the improved code snippet may look like as follows:
// Calculate statistics based on the content size. Tuple3<Double, Long, Long> contentSizeStats = sqlContext.sql("SELECT AVG(contentSize), MIN(contentSize), MAX(contentSize) FROM logs") .map(row -> new Tuple3<>(row.getDouble(0), row.getLong(1), row.getLong(2))) .first(); System.out.println(String.format("Content Size Avg: %s, Min: %s, Max: %s", contentSizeStats._1(), contentSizeStats._2(), contentSizeStats._3()));
The following code can be improved by better leveraging SQL:
Namely, SQL already suppports calculating an average via the AVG function. Therefore, the improved code snippet may look like as follows: