cloudera / hue

Open source SQL Query Assistant service for Databases/Warehouses
https://cloudera.com
Apache License 2.0
1.17k stars 367 forks source link

HUE-Oozie possibly corrupting deployment JAR in Java action #151

Closed sarus closed 9 years ago

sarus commented 9 years ago

I created a simple single-step "Java" workflow in HUE via the Oozie editor that runs a map reduce job from a JAR file. The JAR file that contains the job is uploaded to HDFS in the /user/dataloader/jars directory.

I set the path to the JAR file in the workflow as:

image

I also set the HDFS deployment path for the workspace under advanced options to:

/user/dataloader/workspaces/es_test

I save the workflow and create a coordinator to run the workflow. I then submit the workflow and the job fails with the error:

java.lang.ClassNotFoundException: Class com.breachintelligence.cobia.es.ESAggregationLoader not found

When I check in the deployment path there is lib directory and it contains the JAR file which I specified (mr-aggregations-0.1.4.jar). I assume HUE/Oozie copies the JAR file into this lib directory automatically. However, I noticed that the file size was smaller than the original JAR.

I downloaded the JAR file that was copied into the lib deployment directory and it is not a valid JAR (i.e., I can't open it anymore). I've confirmed that the original JAR located on HDFS opens just fine and contains the class that is supposedly missing.

Original JAR Size

image

JAR File after being copied to lib deployment directory

image

My guess is I get the class not found error because the JAR located in the lib deployment directory is no longer a valid JAR file although maybe there is another explanation for why I can't open the JAR file in the deployment directory.

Everything in the configuration for the coordinator seems to be correct as far as paths where it should be looking for the JAR:

untitled

Has anyone seen this problem before? Does anyone have a recommend workaround or solution or am I just doing something completely wrong.

Thank you!

Versions:
Hue 3.5.0
CDH 2.3.0-cdh5.0.2
sarus commented 9 years ago

Tested a bit more and it seems like the JAR file always get truncated down to 64MB.

romainr commented 9 years ago

The copy file had a bug (https://issues.cloudera.org/browse/HUE-2501) that was fixed with https://github.com/cloudera/hue/commit/a7262e57980800dc072ba5a9fae4ae3538c6ad25

Could you edit the file with the one line fix and restart Hue and try again?

sarus commented 9 years ago

That definitely looks like the issue I'm seeing. This is running on a production cluster so just to make sure I'm doing this correctly here's what I plan to do:

  1. On the node running HUE, edit /opt/cloudera/parcels/CDH-5.0.2-1.cdh5.0.2.p0.13/lib/hue/desktop/libs/hadoop/src/hadoop/fs/webhdfs.py with the one line change you linked to.
  2. Restart HUE from the Cloudera Manager status panel.

Is that the correct file to modify?

Thanks!

romainr commented 9 years ago

Yes it is, and do a backup of the file in case before.

e.g. sudo cp /opt/cloudera/parcels/CDH-5.0.2-1.cdh5.0.2.p0.13/lib/hue/desktop/libs/hadoop/src/hadoop/fs/webhdfs.py /opt/cloudera/parcels/CDH-5.0.2-1.cdh5.0.2.p0.13/lib/hue/desktop/libs/hadoop/src/hadoop/fs/webhdfs.py.bak

romainr commented 9 years ago

Good to close?

sarus commented 9 years ago

@romainr Sorry about that. Yes we're good to go. Thank you for your help!