Open epicycle opened 11 years ago
I tried disabling compression on the job output in the base.properties and regenerating all of the data but that didn't help. Any help anyone can offer from here would be appreciated.
Hmm I don't recognize this error. Do you have a patch or a repo where I can try testing out your changes to reproduce it?
After digging into the error online it turns out the schema was to constrained for our usage. Our usernames are longer, our file names are longer, etc. I changed all of the varchar's in the usage_database.rb file to larger and wala it works! For now I made all of the varchar's 50 and for the filename I made that CHAR VARYING(5000) instead of varchar.
Ah wonderful :)
Those types should be changed .. I got the same error and it took time to find it :(
I've modified the hadoop mapper / reducer to work with CDH 4.1.2. This was mostly modifying ivy and upgrading Avro to 1.7.3 with the hadoop2 classifier but I also had to change MyAvroMultipleOutputs to use TaskAttemptContextImpl instead of TaskAttemptContext.
The log upload, mappers, and reducers seemed to work fine. I'm now stuck on the server side with an odd error in HyperSQL and Ruby.
Found 4 files to process /staging/white-elephant/apache-tomcat-7.0.42/webapps/WhiteElephant/WEB-INF/app/usage_loader.rb:185 warning: ambiguous Java methods found, using submit(java.util.concurrent.Callable) Failed loading file hdfs://cluster-company/data/hadoop/stats/usage-per-hour/cluster-company/2013/0828/part-r-00000.avro: data exception: string data, right truncation org.hsqldb.jdbc.JDBCPreparedStatement.executeBatch(Unknown Source) sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39) sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25) java.lang.reflect.Method.invoke(Method.java:597) org.jruby.javasupport.JavaMethod.invokeDirectWithExceptionHandling(JavaMethod.java:440) org.jruby.javasupport.JavaMethod.invokeDirect(JavaMethod.java:304) org.jruby.java.invokers.InstanceMethodInvoker.call(InstanceMethodInvoker.java:52) org.jruby.internal.runtime.methods.AliasMethod.call(AliasMethod.java:56) org.jruby.runtime.callsite.CachingCallSite.cacheAndCall(CachingCallSite.java:306) org.jruby.runtime.callsite.CachingCallSite.call(CachingCallSite.java:136) org.jruby.ast.CallNoArgNode.interpret(CallNoArgNode.java:64) org.jruby.ast.NewlineNode.interpret(NewlineNode.java:105) org.jruby.ast.BlockNode.interpret(BlockNode.java:71) org.jruby.ast.IfNode.interpret(IfNode.java:116) org.jruby.ast.NewlineNode.interpret(NewlineNode.java:105) org.jruby.ast.BlockNode.interpret(BlockNode.java:71) org.jruby.ast.RescueNode.executeBody(RescueNode.java:224) org.jruby.ast.RescueNode.interpret(RescueNode.java:119) org.jruby.evaluator.ASTInterpreter.INTERPRET_METHOD(ASTInterpreter.java:75) org.jruby.internal.runtime.methods.InterpretedMethod.call(InterpretedMethod.java:112) org.jruby.internal.runtime.methods.DefaultMethod.call(DefaultMethod.java:154) UsageFileLoadTask_1496665911.call(UsageFileLoadTask_1496665911.gen:13) java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:303) java.util.concurrent.FutureTask.run(FutureTask.java:138) java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886) java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908) java.lang.Thread.run(Thread.java:662)
Cleaning up data for file hdfs://cluster-company/data/hadoop/stats/usage-per-hour/cluster-company/2013/0828/part-r-00000.avro with ID 3
The avro files seem to be fine but it's hard to tell as this is my first time using White Elephant. The parsed log file avro files are certainly a lot larger, coming in around 80mb each whereas the hourly files are 2-12kb each.
Has anyone else run into this problem? Any ideas where to go from here? Could this be an LZO decompression issue?
Thanks for the help.