edwardcapriolo / filecrush

Remedy small files by combining them into larger ones.
193 stars 120 forks source link

Exception in thread "main" java.lang.NumberFormatException: null #5

Closed agillan closed 10 years ago

agillan commented 10 years ago

Hi,

I'm sorry if this is a dumb question, but I can't figure out how to run the file crusher on my Hadoop cluster - I keep getting a class not found error. This is the command I'm running: hadoop jar filecrush-2.2.2-SNAPSHOT.jar Crush /user/zslf023/pdb/all /user/zslf023/pdb/tenkcrushed 201424071559

which then returns:

Exception in thread "main" java.lang.ClassNotFoundException: Crush
    at java.net.URLClassLoader$1.run(URLClassLoader.java:202)

Could you please let me know where I'm going wrong?

Thanks in advance! Ana

agillan commented 10 years ago

Ok, so I found that I have to put the full class path to crush like so:

hadoop jar filecrush-2.2.2-SNAPSHOT.jar com.m6d.filecrush.crush.Crush /user/zslf023/pdb/all /user/zslf023/pdb/tenkcrushed 20140725112332

But now I get the following error:

Exception in thread "main" java.lang.NumberFormatException: null
    at java.lang.Long.parseLong(Long.java:375)
    at java.lang.Long.parseLong(Long.java:468)
    at com.m6d.filecrush.crush.Crush.createJobConfAndParseArgs(Crush.java:491)
    at com.m6d.filecrush.crush.Crush.run(Crush.java:595)
    at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:70)
    at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:84)
    at com.m6d.filecrush.crush.Crush.main(Crush.java:1313)
    at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
    at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
    at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
    at java.lang.reflect.Method.invoke(Method.java:597)
    at org.apache.hadoop.util.RunJar.main(RunJar.java:212)

It seems something is wrong with this line? dfsBlockSize = Long.parseLong(job.get("dfs.block.size"));

edwardcapriolo commented 10 years ago

Do you know if this variable has changed. WE originally targeted against hadoop 0.20.2 maybe things are getting deprecated and moved around?

agillan commented 10 years ago

Looks like they changed it to dfs.blocksize in Hadoop 2.04 http://hadoop.apache.org/docs/r2.0.4-alpha/hadoop-project-dist/hadoop-common/DeprecatedProperties.html

If I change that line in the Crush code to job.get("dfs.blocksize"), it should work, right?

edwardcapriolo commented 10 years ago

Yes we should make a patch that attempts to use both variables.

edwardcapriolo commented 10 years ago

Ill do that now

agillan commented 10 years ago

Thanks!

agillan commented 10 years ago

Could I also please ask you a really quick favour? I might well use this code as part of my masters thesis work, and it would really help me out if you could register your repository at this website: https://guides.github.com/activities/citable-code/ so that I can reference it properly in my bibliography. Would that be ok? Thanks again.

edwardcapriolo commented 10 years ago

Cool. Yes. I will fill that out. Send me a link to the paper when it is completed.

I am running tests now. One thing to note. What I am doing is patching in this bug fix, but what we should do is upgrade filecrush to test against a newer hadoop. Because currently since we are not testing against hadoop 2.4 we are not showing that this fix actually works. That is a larger effort for another ticket but if you would like to take that on it would be great.

edwardcapriolo commented 10 years ago

This fix should be merged into https://github.com/edwardcapriolo/filecrush/pull/6

edwardcapriolo commented 10 years ago

https://zenodo.org/badge/doi/10.5281/zenodo.11038.png

I have a badge but my page is not md so I can not display it. Also trunk has a fix for your blocksize bug. Try it out and close issue if that is a fix.

agillan commented 10 years ago

Thanks for that! I just tried to package it and I got a build failure:

Tests run: 98, Failures: 1, Errors: 0, Skipped: 0

[INFO] ------------------------------------------------------------------------
[INFO] BUILD FAILURE
[INFO] ------------------------------------------------------------------------
[INFO] Total time: 11:54 min
[INFO] Finished at: 2014-07-25T16:31:16+01:00
[INFO] Final Memory: 12M/43M
[INFO] ------------------------------------------------------------------------
[ERROR] Failed to execute goal org.apache.maven.plugins:maven-surefire-plugin:2.12.4:test (default-test) on project filecrush: There are test failures.

This is the test report output:

<testcase time="0.079" classname="com.m6d.filecrush.crush.CrushTest" name="bucketing">
    <failure message="
Expected: &lt;{/var/folders/j4/m8f6sqzd3fv1k134pbqkf7j80000gn/T/junit9031605743255637183/in/2-1=0, /var/folders/j4/m8f6sqzd3fv1k134pbqkf7j80000gn/T/junit9031605743255637183/in/2/2.4/2.4.2-0=1, /var/folders/j4/m8f6sqzd3fv1k134pbqkf7j80000gn/T/junit9031605743255637183/in/1/1.1-2=4, /var/folders/j4/m8f6sqzd3fv1k134pbqkf7j80000gn/T/junit9031605743255637183/in/2/2.2-1=3, /var/folders/j4/m8f6sqzd3fv1k134pbqkf7j80000gn/T/junit9031605743255637183/in/1/1.1-0=2, /var/folders/j4/m8f6sqzd3fv1k134pbqkf7j80000gn/T/junit9031605743255637183/in/1/1.1-1=3, /var/folders/j4/m8f6sqzd3fv1k134pbqkf7j80000gn/T/junit9031605743255637183/in/1/1.2-0=4}&gt;
     got: &lt;{/var/folders/j4/m8f6sqzd3fv1k134pbqkf7j80000gn/T/junit9031605743255637183/in/2-1=0, /var/folders/j4/m8f6sqzd3fv1k134pbqkf7j80000gn/T/junit9031605743255637183/in/2/2.4/2.4.2-0=1, /var/folders/j4/m8f6sqzd3fv1k134pbqkf7j80000gn/T/junit9031605743255637183/in/1/1.1-2=3, /var/folders/j4/m8f6sqzd3fv1k134pbqkf7j80000gn/T/junit9031605743255637183/in/2/2.2-1=3, /var/folders/j4/m8f6sqzd3fv1k134pbqkf7j80000gn/T/junit9031605743255637183/in/1/1.1-0=2, /var/folders/j4/m8f6sqzd3fv1k134pbqkf7j80000gn/T/junit9031605743255637183/in/1/1.1-1=4, /var/folders/j4/m8f6sqzd3fv1k134pbqkf7j80000gn/T/junit9031605743255637183/in/1/1.2-0=4}&gt;
" type="java.lang.AssertionError">java.lang.AssertionError: 
Expected: &lt;{/var/folders/j4/m8f6sqzd3fv1k134pbqkf7j80000gn/T/junit9031605743255637183/in/2-1=0, /var/folders/j4/m8f6sqzd3fv1k134pbqkf7j80000gn/T/junit9031605743255637183/in/2/2.4/2.4.2-0=1, /var/folders/j4/m8f6sqzd3fv1k134pbqkf7j80000gn/T/junit9031605743255637183/in/1/1.1-2=4, /var/folders/j4/m8f6sqzd3fv1k134pbqkf7j80000gn/T/junit9031605743255637183/in/2/2.2-1=3, /var/folders/j4/m8f6sqzd3fv1k134pbqkf7j80000gn/T/junit9031605743255637183/in/1/1.1-0=2, /var/folders/j4/m8f6sqzd3fv1k134pbqkf7j80000gn/T/junit9031605743255637183/in/1/1.1-1=3, /var/folders/j4/m8f6sqzd3fv1k134pbqkf7j80000gn/T/junit9031605743255637183/in/1/1.2-0=4}&gt;
     got: &lt;{/var/folders/j4/m8f6sqzd3fv1k134pbqkf7j80000gn/T/junit9031605743255637183/in/2-1=0, /var/folders/j4/m8f6sqzd3fv1k134pbqkf7j80000gn/T/junit9031605743255637183/in/2/2.4/2.4.2-0=1, /var/folders/j4/m8f6sqzd3fv1k134pbqkf7j80000gn/T/junit9031605743255637183/in/1/1.1-2=3, /var/folders/j4/m8f6sqzd3fv1k134pbqkf7j80000gn/T/junit9031605743255637183/in/2/2.2-1=3, /var/folders/j4/m8f6sqzd3fv1k134pbqkf7j80000gn/T/junit9031605743255637183/in/1/1.1-0=2, /var/folders/j4/m8f6sqzd3fv1k134pbqkf7j80000gn/T/junit9031605743255637183/in/1/1.1-1=4, /var/folders/j4/m8f6sqzd3fv1k134pbqkf7j80000gn/T/junit9031605743255637183/in/1/1.2-0=4}&gt;

    at org.junit.Assert.assertThat(Assert.java:778)
    at org.junit.Assert.assertThat(Assert.java:736)
    at com.m6d.filecrush.crush.CrushTest.bucketing(CrushTest.java:725)
    at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
    at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
    at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
    at java.lang.reflect.Method.invoke(Method.java:606)
    at org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMethod.java:44)
    at org.junit.internal.runners.model.ReflectiveCallable.run(ReflectiveCallable.java:15)
    at org.junit.runners.model.FrameworkMethod.invokeExplosively(FrameworkMethod.java:41)
    at org.junit.internal.runners.statements.InvokeMethod.evaluate(InvokeMethod.java:20)
    at org.junit.internal.runners.statements.RunBefores.evaluate(RunBefores.java:28)
    at org.junit.internal.runners.statements.RunAfters.evaluate(RunAfters.java:31)
    at org.junit.rules.ExternalResource$1.evaluate(ExternalResource.java:43)
    at org.junit.runners.BlockJUnit4ClassRunner.runNotIgnored(BlockJUnit4ClassRunner.java:79)
    at org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:71)
    at org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:49)
    at org.junit.runners.ParentRunner$3.run(ParentRunner.java:193)
    at org.junit.runners.ParentRunner$1.schedule(ParentRunner.java:52)
    at org.junit.runners.ParentRunner.runChildren(ParentRunner.java:191)
    at org.junit.runners.ParentRunner.access$000(ParentRunner.java:42)
    at org.junit.runners.ParentRunner$2.evaluate(ParentRunner.java:184)
    at org.junit.runners.ParentRunner.run(ParentRunner.java:236)
    at org.apache.maven.surefire.junit4.JUnit4Provider.execute(JUnit4Provider.java:252)
    at org.apache.maven.surefire.junit4.JUnit4Provider.executeTestSet(JUnit4Provider.java:141)
    at org.apache.maven.surefire.junit4.JUnit4Provider.invoke(JUnit4Provider.java:112)
    at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
    at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
    at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
    at java.lang.reflect.Method.invoke(Method.java:606)
    at org.apache.maven.surefire.util.ReflectionUtils.invokeMethodWithArray(ReflectionUtils.java:189)
    at org.apache.maven.surefire.booter.ProviderFactory$ProviderProxy.invoke(ProviderFactory.java:165)
    at org.apache.maven.surefire.booter.ProviderFactory.invokeProvider(ProviderFactory.java:85)
    at org.apache.maven.surefire.booter.ForkedBooter.runSuitesInProcess(ForkedBooter.java:115)
    at org.apache.maven.surefire.booter.ForkedBooter.main(ForkedBooter.java:75)
</failure>
  </testcase>

As for taking on the testing for later Hadoop versions, I'll be doing an informal run with my files now, but I'm not sure I could take on any formal testing at the moment, I'm sorry!

edwardcapriolo commented 10 years ago

I had a weird error with that to. Honestly I think that test is somehow JVM sensitive. But I have not had time to dig in. I am using java version "1.7.0_45" which caused me to have to change that exact test. I made a note of it in my diff. For now I would do mvn -Dmaven.test.skip=true because I think that is just non-deterministic testing and not a bug.

agillan commented 10 years ago

Ok, great, I did that and it packages and runs on my Hadoop 2.06 cluster!

alexmc6 commented 9 years ago

I just built filecrush and had the same (single) test failure.

]$ java -version java version "1.7.0" Java(TM) SE Runtime Environment (build pxa6470sr7-20140410_01(SR7)) IBM J9 VM (build 2.6, JRE 1.7.0 Linux amd64-64 Compressed References 20140409_195732 (JIT enabled, AOT enabled) J9VM - R26_Java726_SR7_20140409_1418_B195732 JIT - r11.b06_20140409_61252 GC - R26_Java726_SR7_20140409_1418_B195732_CMPRSS J9CL - 20140409_195732) JCL - 20140409_01 based on Oracle 7u55-b13 DEV [mclinta@vrdevamc001 surefire-reports]$ arch x86_64 DEV [mclinta@vrdevamc001 surefire-reports]$ uname -a Linux vrdevamc001.iggroup.local 2.6.32-431.20.3.el6.x86_64 #1 SMP Fri Jun 6 18:30:54 EDT 2014 x86_64 x86_64 x86_64 GNU/Linux DEV [mclinta@vrdevamc001 surefire-reports]$