Real Time Big Data Projects Team Discussions

mahesh-orienit commented 7 years ago

What is the expected output for MapReduceTask_1?

shiva123k commented 7 years ago

Partitioned the given data into 'country' and 'status' output can be in the form of text,pd,xml,json ex :Country status India success Australia success Iran fail

mahesh-orienit commented 7 years ago

By using Hive I cant see the output. Any suggestions?

MapReduceTask_2: ➢ Input can be any format like `text, pdf, xml, json` ➢ Find the top 10 Countries based on their status is `SUCCESS` ➢ Output can be any format like `text, pdf, xml, json`

hive> select country, count(1) as cnt

from tdata where status = 'success' group by country order by cnt desc limit 10; Query ID = orienit_20170920200909_04c44470-526d-4c59-a497-ae653ea47c85 Total jobs = 2 Launching Job 1 out of 2 Number of reduce tasks not specified. Estimated from input data size: 1 In order to change the average load for a reducer (in bytes): set hive.exec.reducers.bytes.per.reducer= In order to limit the maximum number of reducers: set hive.exec.reducers.max= In order to set a constant number of reducers: set mapreduce.job.reduces= Starting Job = job_1505885960777_0003, Tracking URL = http://quickstart.cloudera:8088/proxy/application_1505885960777_0003/ Kill Command = /usr/lib/hadoop/bin/hadoop job -kill job_1505885960777_0003 Hadoop job information for Stage-1: number of mappers: 1; number of reducers: 1 2017-09-20 20:10:55,892 Stage-1 map = 0%, reduce = 0% 2017-09-20 20:11:56,514 Stage-1 map = 0%, reduce = 0% 2017-09-20 20:12:25,014 Stage-1 map = 100%, reduce = 0%, Cumulative CPU 4.91 sec 2017-09-20 20:13:03,633 Stage-1 map = 100%, reduce = 100%, Cumulative CPU 7.51 sec MapReduce Total cumulative CPU time: 7 seconds 510 msec Ended Job = job_1505885960777_0003 Launching Job 2 out of 2 Number of reduce tasks determined at compile time: 1 In order to change the average load for a reducer (in bytes): set hive.exec.reducers.bytes.per.reducer= In order to limit the maximum number of reducers: set hive.exec.reducers.max= In order to set a constant number of reducers: set mapreduce.job.reduces= Starting Job = job_1505885960777_0004, Tracking URL = http://quickstart.cloudera:8088/proxy/application_1505885960777_0004/ Kill Command = /usr/lib/hadoop/bin/hadoop job -kill job_1505885960777_0004 Hadoop job information for Stage-2: number of mappers: 1; number of reducers: 1 2017-09-20 20:13:22,019 Stage-2 map = 0%, reduce = 0% 2017-09-20 20:13:41,243 Stage-2 map = 100%, reduce = 0%, Cumulative CPU 1.98 sec 2017-09-20 20:14:00,358 Stage-2 map = 100%, reduce = 100%, Cumulative CPU 4.06 sec MapReduce Total cumulative CPU time: 4 seconds 60 msec Ended Job = job_1505885960777_0004 MapReduce Jobs Launched: Stage-Stage-1: Map: 1 Reduce: 1 Cumulative CPU: 7.51 sec HDFS Read: 6175073 HDFS Write: 96 SUCCESS Stage-Stage-2: Map: 1 Reduce: 1 Cumulative CPU: 4.06 sec HDFS Read: 5102 HDFS Write: 0 SUCCESS Total MapReduce CPU Time Spent: 11 seconds 570 msec OK Time taken: 252.168 seconds hive>

kalyanhadooptraining commented 7 years ago

where status = 'SUCCESS' ....Chane to capital letters

On Wednesday, September 20, 2017, mahesh-orienit notifications@github.com wrote:

By using Hive I cant see the output. Any suggestions? MapReduceTask_2: ➢ Input can be any format like text, pdf, xml, json ➢ Find the top 10 Countries based on their status is SUCCESS ➢ Output can be any format like text, pdf, xml, json

hive> select country, count(1) as cnt

from tdata where status = 'success' group by country order by cnt desc limit 10; Query ID = orienit_20170920200909_04c44470-526d-4c59-a497-ae653ea47c85 Total jobs = 2 Launching Job 1 out of 2 Number of reduce tasks not specified. Estimated from input data size: 1 In order to change the average load for a reducer (in bytes): set hive.exec.reducers.bytes.per.reducer= In order to limit the maximum number of reducers: set hive.exec.reducers.max= In order to set a constant number of reducers: set mapreduce.job.reduces= Starting Job = job_1505885960777_0003, Tracking URL = http://quickstart.cloudera:8088/proxy/application_1505885960777_0003/ Kill Command = /usr/lib/hadoop/bin/hadoop job -kill job_1505885960777_0003 Hadoop job information for Stage-1: number of mappers: 1; number of reducers: 1 2017-09-20 20:10:55,892 Stage-1 map = 0%, reduce = 0% 2017-09-20 20:11:56,514 Stage-1 map = 0%, reduce = 0% 2017-09-20 20:12:25,014 Stage-1 map = 100%, reduce = 0%, Cumulative CPU 4.91 sec 2017-09-20 20:13:03,633 Stage-1 map = 100%, reduce = 100%, Cumulative CPU 7.51 sec MapReduce Total cumulative CPU time: 7 seconds 510 msec Ended Job = job_1505885960777_0003 Launching Job 2 out of 2 Number of reduce tasks determined at compile time: 1 In order to change the average load for a reducer (in bytes): set hive.exec.reducers.bytes.per.reducer= In order to limit the maximum number of reducers: set hive.exec.reducers.max= In order to set a constant number of reducers: set mapreduce.job.reduces= Starting Job = job_1505885960777_0004, Tracking URL = http://quickstart.cloudera:8088/proxy/application_1505885960777_0004/ Kill Command = /usr/lib/hadoop/bin/hadoop job -kill job_1505885960777_0004 Hadoop job information for Stage-2: number of mappers: 1; number of reducers: 1 2017-09-20 20:13:22,019 Stage-2 map = 0%, reduce = 0% 2017-09-20 20:13:41,243 Stage-2 map = 100%, reduce = 0%, Cumulative CPU 1.98 sec 2017-09-20 20:14:00,358 Stage-2 map = 100%, reduce = 100%, Cumulative CPU 4.06 sec MapReduce Total cumulative CPU time: 4 seconds 60 msec Ended Job = job_1505885960777_0004 MapReduce Jobs Launched: Stage-Stage-1: Map: 1 Reduce: 1 Cumulative CPU: 7.51 sec HDFS Read: 6175073 HDFS Write: 96 SUCCESS Stage-Stage-2: Map: 1 Reduce: 1 Cumulative CPU: 4.06 sec HDFS Read: 5102 HDFS Write: 0 SUCCESS Total MapReduce CPU Time Spent: 11 seconds 570 msec OK Time taken: 252.168 seconds hive>

— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub https://github.com/KalyanHadoopProjects/project-batch1-team1/issues/1#issuecomment-330878119, or mute the thread https://github.com/notifications/unsubscribe-auth/ALd9Ia2KAnwrEdwzSRYu5QhN6P0i_5VAks5skSeCgaJpZM4PUYHB .

-- Thanks & Regards Kalyan

Follow me on :- Email: kalyanhadooptraining@gmail.com Blog: http://www.kalyanhadooptraining. http://kalyanhadooptraining.blogspot.in/com Web: http://www.bigdatatraininghyderabad.com http://bigdatatraininghyderabad.com/

Youube: https://www.youtube.com/kalyanhadooptraining LinkedIn: https://in.linkedin.com/in/kalyanhadooptraining Facebook: https://www.facebook.com/orienit.hadoop Twitter: https://twitter.com/kalyanhadoop Google+: https://plus.google.com/+KalyanHadoopTraining

mahesh-orienit commented 7 years ago

got the answer. Thank you! btw there are only 4 countries having the status Success. US 16348 GB 5518 DE 4118 FR 1411

mahesh-orienit commented 7 years ago

I am getting below error for this query, any suggestions? CREATE TABLE hivetask1(name string, id int, course string, year int) ROW FORMAT SERDE ‘org.apache.hive.hcatalog.data.JsonSerDe’;

[orienit@quickstart ~]$ hive -f '/home/orienit/work/Assignment_stuff/HiveTasks/HiveTask_1.hql'; SLF4J: Class path contains multiple SLF4J bindings. SLF4J: Found binding in [jar:file:/usr/lib/phoenix/phoenix-4.10.0-HBase-1.2-client.jar!/org/slf4j/impl/StaticLoggerBinder.class] SLF4J: Found binding in [jar:file:/usr/lib/zookeeper/lib/slf4j-log4j12-1.7.5.jar!/org/slf4j/impl/StaticLoggerBinder.class] SLF4J: See http://www.slf4j.org/codes.html#multiple_bindings for an explanation. SLF4J: Class path contains multiple SLF4J bindings. SLF4J: Found binding in [jar:file:/usr/lib/zookeeper/lib/slf4j-log4j12-1.7.5.jar!/org/slf4j/impl/StaticLoggerBinder.class] SLF4J: Found binding in [jar:file:/usr/lib/phoenix/phoenix-4.10.0-HBase-1.2-client.jar!/org/slf4j/impl/StaticLoggerBinder.class] SLF4J: See http://www.slf4j.org/codes.html#multiple_bindings for an explanation. SLF4J: Actual binding is of type [org.slf4j.impl.Log4jLoggerFactory]

Logging initialized using configuration in file:/etc/hive/conf.dist/hive-log4j.properties OK Time taken: 1.929 seconds MismatchedTokenException(26!=307) at org.antlr.runtime.BaseRecognizer.recoverFromMismatchedToken(BaseRecognizer.java:617) at org.antlr.runtime.BaseRecognizer.match(BaseRecognizer.java:115) at org.apache.hadoop.hive.ql.parse.HiveParser.rowFormatSerde(HiveParser.java:34269) at org.apache.hadoop.hive.ql.parse.HiveParser.tableRowFormat(HiveParser.java:34761) at org.apache.hadoop.hive.ql.parse.HiveParser.createTableStatement(HiveParser.java:5196) at org.apache.hadoop.hive.ql.parse.HiveParser.ddlStatement(HiveParser.java:2557) at org.apache.hadoop.hive.ql.parse.HiveParser.execStatement(HiveParser.java:1589) at org.apache.hadoop.hive.ql.parse.HiveParser.statement(HiveParser.java:1065) at org.apache.hadoop.hive.ql.parse.ParseDriver.parse(ParseDriver.java:201) at org.apache.hadoop.hive.ql.parse.ParseDriver.parse(ParseDriver.java:166) at org.apache.hadoop.hive.ql.Driver.compile(Driver.java:522) at org.apache.hadoop.hive.ql.Driver.compileInternal(Driver.java:1356) at org.apache.hadoop.hive.ql.Driver.runInternal(Driver.java:1473) at org.apache.hadoop.hive.ql.Driver.run(Driver.java:1285) at org.apache.hadoop.hive.ql.Driver.run(Driver.java:1275) at org.apache.hadoop.hive.cli.CliDriver.processLocalCmd(CliDriver.java:220) at org.apache.hadoop.hive.cli.CliDriver.processCmd(CliDriver.java:172) at org.apache.hadoop.hive.cli.CliDriver.processLine(CliDriver.java:383) at org.apache.hadoop.hive.cli.CliDriver.processLine(CliDriver.java:318) at org.apache.hadoop.hive.cli.CliDriver.processReader(CliDriver.java:416) at org.apache.hadoop.hive.cli.CliDriver.processFile(CliDriver.java:432) at org.apache.hadoop.hive.cli.CliDriver.executeDriver(CliDriver.java:726) at org.apache.hadoop.hive.cli.CliDriver.run(CliDriver.java:693) at org.apache.hadoop.hive.cli.CliDriver.main(CliDriver.java:628) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:497) at org.apache.hadoop.util.RunJar.run(RunJar.java:221) at org.apache.hadoop.util.RunJar.main(RunJar.java:136) FAILED: ParseException line 4:18 mismatched input 'org' expecting StringLiteral near 'SERDE' in serde format specification WARN: The method class org.apache.commons.logging.impl.SLF4JLogFactory#release() was invoked. WARN: Please see http://www.slf4j.org/codes.html#release for an explanation. [orienit@quickstart ~]$

KalyanHadoopRealTimeProjects-1 / project-batch1-team1

Real Time Big Data Projects Team Discussions #1

MapReduceTask_2: ➢ Input can be any format like `text, pdf, xml, json` ➢ Find the top 10 Countries based on their status is `SUCCESS` ➢ Output can be any format like `text, pdf, xml, json`