Closed aaron7 closed 11 years ago
Need someone to run on the Cloudera VM to check. Build a jar file with the RunDOSJob.java as the main class and call: hadoop jar runJob.jar You will need to put the csv file into the HDFS under input (as described in code). Use: nfdump -r data.nfcapd -o "fmt:%ra,%ts,%te,%pr,%sa,%da,%sp,%dp,%pkt,%byt,%flg,%tos" -q -N >> netflow_anonymous.csv to generate the csv file for the job and use HDFS's put operation to put it into the file system.
After 5-10 mins of running the job on the HDFS local system - around 100% CPU on all 4 cores 6GB memory used: Gets to the final reduce and gives this error