LinkedInAttic / white-elephant

Hadoop log aggregator and dashboard
Other
191 stars 62 forks source link

Use of uninitialized value in statsupload.pl #4

Open alexanderfahlke opened 11 years ago

alexanderfahlke commented 11 years ago

While uploading the logfiles I get the strange error:

Use of uninitialized value $jobconfxml in concatenation (.) or string at ./statsupload.pl line 300.

My config:

our %CFG = (
    "hadoop_home" => "/home/hadoop/bin/hadoop",
    "hadoop_logs" => "/home/hadoop/bin/hadoop/logs",
    "days"        => 2,
    "queues"      => ['default'],
    "grid"        => 'test',
    "destination" => "hdfs://localhost:9000/user/hadoop/history",
);

My hadoop logfiles (for testing, 49 in total):

hadoop@dev12:~/bin/hadoop$ ls -1 /home/hadoop/bin/hadoop/logs/history/*.xml 
/home/hadoop/bin/hadoop/logs/history/localhost_1363694800974_job_201303191306_0001_conf.xml
/home/hadoop/bin/hadoop/logs/history/localhost_1363694800974_job_201303191306_0002_conf.xml
...
/home/hadoop/bin/hadoop/logs/history/localhost_1363694800974_job_201303191306_0024_conf.xml

/home/hadoop/bin/hadoop/logs/history/localhost_1363702263722_job_201303191511_0001_conf.xml
/home/hadoop/bin/hadoop/logs/history/localhost_1363702263722_job_201303191511_0002_conf.xml
...
/home/hadoop/bin/hadoop/logs/history/localhost_1363702263722_job_201303191511_0008_conf.xml

/home/hadoop/bin/hadoop/logs/history/localhost_1363721823970_job_201303192037_0001_conf.xml
/home/hadoop/bin/hadoop/logs/history/localhost_1363721823970_job_201303192037_0002_conf.xml
...
/home/hadoop/bin/hadoop/logs/history/localhost_1363721823970_job_201303192037_0018_conf.xml

The final output of the script is:

Uploaded 98 files, found 0 existing

If I check that in HDFS I only see 3 files:

hadoop@dev12:~/bin/hadoop$ hadoop fs -ls /user/hadoop/history/test/daily/default/2013/0319
Found 3 items
-rw-r--r--   1 hadoop supergroup       7171 2013-03-24 00:01 /user/hadoop/history/test/daily/default/2013/0319/localhost_1363694800974_job.log
-rw-r--r--   1 hadoop supergroup      28748 2013-03-24 00:01 /user/hadoop/history/test/daily/default/2013/0319/localhost_1363702263722_job.log
-rw-r--r--   1 hadoop supergroup       6285 2013-03-24 00:01 /user/hadoop/history/test/daily/default/2013/0319/localhost_1363721823970_job.log

I'm using:

matthayes commented 11 years ago

I just now pushed some improvements to the upload script. Can you try it again and share the output?

The "uninitialized value" error was happening because it didn't find the job conf xml corresponding to the log file. The script now catches this instead of failing.

The real issue was that the log file names are in a different format than the script expects. I changed it so it is more flexible, looking for the part of the file starting with "job". Yours start with "localhost", which confused the script. Anyways this should work now.

I also added more logging so it's easier to tell what's happening. So if it still doesn't work after this it should be clearer why :)

alexanderfahlke commented 11 years ago

That did the trick! Now I've got 98 files in HDFS (49 .xml and 49 .log).

But the log output is a bit misleading because it says that the script is uploading .pig and .jar files.

...
Uploading /home/hadoop/bin/hadoop/logs/history/localhost_1363694800974_job_201303191306_0008_hadoop_PigLatin%3Ahdfsdu.pig
-> hdfs://localhost:9000/user/hadoop/history/test/daily/default/2013/0319/job_201303191306_0008.log
command: /home/hadoop/bin/hadoop/bin/hadoop dfs -put /home/hadoop/bin/hadoop/logs/history/localhost_1363694800974_job_201303191306_0008_hadoop_PigLatin%3Ahdfsdu.pig hdfs://localhost:9000/user/hadoop/history/test/daily/default/2013/0319/job_201303191306_0008.log

Uploading /home/hadoop/bin/hadoop/logs/history/localhost_1363694800974_job_201303191306_0008_conf.xml
-> hdfs://localhost:9000/user/hadoop/history/test/daily/default/2013/0319/job_201303191306_0008_conf.xml
command: /home/hadoop/bin/hadoop/bin/hadoop dfs -put /home/hadoop/bin/hadoop/logs/history/localhost_1363694800974_job_201303191306_0008_conf.xml hdfs://localhost:9000/user/hadoop/history/test/daily/default/2013/0319/job_201303191306_0008_conf.xml

Uploading /home/hadoop/bin/hadoop/logs/history/localhost_1363721823970_job_201303192037_0003_hadoop_Job6321920289122417517.jar
-> hdfs://localhost:9000/user/hadoop/history/test/daily/default/2013/0319/job_201303192037_0003.log
command: /home/hadoop/bin/hadoop/bin/hadoop dfs -put /home/hadoop/bin/hadoop/logs/history/localhost_1363721823970_job_201303192037_0003_hadoop_Job6321920289122417517.jar hdfs://localhost:9000/user/hadoop/history/test/daily/default/2013/0319/job_201303192037_0003.log

Uploading /home/hadoop/bin/hadoop/logs/history/localhost_1363721823970_job_201303192037_0003_conf.xml
-> hdfs://localhost:9000/user/hadoop/history/test/daily/default/2013/0319/job_201303192037_0003_conf.xml
command: /home/hadoop/bin/hadoop/bin/hadoop dfs -put /home/hadoop/bin/hadoop/logs/history/localhost_1363721823970_job_201303192037_0003_conf.xml hdfs://localhost:9000/user/hadoop/history/test/daily/default/2013/0319/job_201303192037_0003_conf.xml
...

But there are (as expected) no pigs and jars stored in HDFS:

...
-rw-r--r--   1 hadoop supergroup       7208 2013-03-24 01:54 /user/hadoop/history/test/daily/default/2013/0319/job_201303191306_0008.log
-rw-r--r--   1 hadoop supergroup     114619 2013-03-24 01:54 /user/hadoop/history/test/daily/default/2013/0319/job_201303191306_0008_conf.xml
...
-rw-r--r--   1 hadoop supergroup       4619 2013-03-24 01:54 /user/hadoop/history/test/daily/default/2013/0319/job_201303192037_0003.log
-rw-r--r--   1 hadoop supergroup     176327 2013-03-24 01:54 /user/hadoop/history/test/daily/default/2013/0319/job_201303192037_0003_conf.xml
...
matthayes commented 11 years ago

I'll try making this more clear. The script renames the log files using a more consistent naming convention. The .jar and .pig files are actually log files, so they should end in .log ;)

alexanderfahlke commented 11 years ago

Ah cool, I didn't knew that. Every day learning something new...

So this is almost fixed (except the confusion with the names of the log files).

PreethaDevi commented 9 years ago

Hi All,

I have done all the changes in the files as per my cluster instances. But i am not getting the steps as how to work with white-elephant. can you please list me down the steps to execute it in hadoop.

when i tried i m getting the following error.

[xxx@vp21q39ic-hpao101328 ~]$ ls __MACOSX white-elephant-master white-elephant-master.zip [xxx@clusterinstance ~]$ cd white-elephant-master [xxx@clusterinstance white-elephant-master]$ cd hadoop/scripts/ [xxx@clusterinstance scripts]$ ls README.md cfg.pm statsupload.pl [xxx@clusterinstance scripts]$ ./statsupload.pl cfg.pm perl: warning: Setting locale failed. perl: warning: Please check that your locale settings: LANGUAGE = (unset), LC_ALL = (unset), LC_CTYPE = "UTF-8", LANG = "en_US.UTF-8" are supported and installed on your system. perl: warning: Falling back to the standard locale ("C"). Can't locate Date/Calc.pm in @INC (@INC contains: /usr/local/lib64/perl5 /usr/local/share/perl5 /usr/lib64/perl5/vendor_perl /usr/share/perl5/vendor_perl /usr/lib64/perl5 /usr/share/perl5 .) at ./statsupload.pl line 11. BEGIN failed--compilation aborted at ./statsupload.pl line 11.

Please help me with the detailed steps. In http://data.linkedin.com/opensource/white-elephant this page i am seeing some deployment steps and all, when all these should be done..

Confused :(

Thanks