Closed SwathiMystery closed 11 years ago
I see, so it appears your job conf xml files have a different naming convention than the script expects. As a workaround you could just comment out lines 285-297 since the job xml files are not actually used at the moment. Can you let me know if this fixes it for you? I think this is where the script is probably failing. I'll fix the script so it finds the conf files correctly. Thanks for your patience :)
Sure. I am trying to test the tool on 1+6 node cluster on cloud. Just wanted to make sure everything works and my understanding is right, before deploying to a larger cluster. I am pretty much interested in this tool and would like to contribute going forward.
I commented the lines in statsupload.pl from 285-297, as you suggested and I see the same issue.
AFAIK, the logs generated will be of this pattern in the folder
/var/log/hadoop/logs/history/done/
I was able to reproduce your problem. I missed a line you need to change as a workaround. Comment out the "findqueue" line like this:
# $queue = findqueue( $xml );
$queue = "default";
It's getting the queue name from the job conf xml and since the script can't find it it's failing.
Awesome! setting it to default is able to put the files. However, while checking HDFS data, it says Found 0 existing files in HDFS. Anything not configured?
I update the script to be more flexible with the job conf xml names. Can you try it again? Under your scenario it should now upload the job conf xml.
It also now logs the command used to list files in HDFS. You can you use this to double check why it isn't finding anything after running the first time.
Also make sure you've updated "days" in cfg.pm to be at least older than your log files being uploaded or it won't search those days in HDFS.
When I run the following command, after making changes to cfg.pm ./statsupload.pl --config cfg.pm I see that ...... .... Searching /var/log/hadoop/logs/history for logs ./statsupload.pl: No such file or directory [/var/log/hadoop/logs/history/done/ec2-XXXXXXXXX.amazonaws.com1363721010255/2013/03/19/000000/job_201303191923_0004_conf.xml] ....
However, I observe that the log files are of format $ cd /var/log/hadoop/logs/history/done/ec2-XXXXXXXXXXXX1363721010255/2013/03/19/000000 $ ls ec2-XXXXXXXXX.amazonaws.com_1363721010255_job_201303191923_0002_conf.xml
Have I missed any configuration? Why is it not searching for ec2-XXXXXXXXX.amazonaws.com_1363721010255_job_201303191923_0002_conf.xml ?
Any help is appreciated in this regard.
Thank You.