apache / parquet-java

Apache Parquet Java
https://parquet.apache.org/
Apache License 2.0
2.63k stars 1.41k forks source link

UT TestSummary failed with "java.lang.RuntimeException: Usage: B = FOREACH (GROUP A ALL) GENERATE Summary(A); Can not get schema from null" when Pig >=0.15 #1853

Closed asfimport closed 8 years ago

asfimport commented 9 years ago

org.apache.pig.impl.logicalLayer.FrontendException: ERROR 1002: Unable to store alias B at org.apache.pig.PigServer$Graph.registerQuery(PigServer.java:1694) at org.apache.pig.PigServer.registerQuery(PigServer.java:623) at org.apache.pig.PigServer.registerQuery(PigServer.java:636) at parquet.pig.summary.TestSummary.testMaxIsZero(TestSummary.java:154) ... Caused by: org.apache.pig.backend.executionengine.ExecException: ERROR 0: java.lang.RuntimeException: Usage: B = FOREACH (GROUP A ALL) GENERATE Summary(A); Can not get schema from null at org.apache.pig.backend.hadoop.executionengine.HExecutionEngine.launchPig(HExecutionEngine.java:307) at org.apache.pig.PigServer.launchPlan(PigServer.java:1390) at org.apache.pig.PigServer.executeCompiledLogicalPlan(PigServer.java:1375) at org.apache.pig.PigServer.execute(PigServer.java:1364) at org.apache.pig.PigServer.access$500(PigServer.java:113) at org.apache.pig.PigServer$Graph.registerQuery(PigServer.java:1689) ... 32 more Caused by: java.lang.RuntimeException: Usage: B = FOREACH (GROUP A ALL) GENERATE Summary(A); Can not get schema from null at parquet.pig.summary.Summary.setInputSchema(Summary.java:266) at org.apache.pig.newplan.logical.expression.ExpToPhyTranslationVisitor.visit(ExpToPhyTranslationVisitor.java:530) at org.apache.pig.newplan.logical.expression.UserFuncExpression.accept(UserFuncExpression.java:132) at org.apache.pig.newplan.ReverseDependencyOrderWalkerWOSeenChk.walk(ReverseDependencyOrderWalkerWOSeenChk.java:69) at org.apache.pig.newplan.logical.relational.LogToPhyTranslationVisitor.visit(LogToPhyTranslationVisitor.java:808) at org.apache.pig.newplan.logical.relational.LOForEach.accept(LOForEach.java:87) at org.apache.pig.newplan.DependencyOrderWalker.walk(DependencyOrderWalker.java:75) at org.apache.pig.newplan.PlanVisitor.visit(PlanVisitor.java:52) at org.apache.pig.backend.hadoop.executionengine.HExecutionEngine.compile(HExecutionEngine.java:258) at org.apache.pig.backend.hadoop.executionengine.HExecutionEngine.launchPig(HExecutionEngine.java:293) ... 37 more Caused by: java.lang.NullPointerException at parquet.pig.summary.Summary.setInputSchema(Summary.java:261) ... 46 more

It relates to a change on pig side: pig/src/org/apache/pig/newplan/logical/expression/ExpToPhyTranslationVisitor.java introduced by PIG-3294

Reporter: Xiang Li Assignee: Thomas Friedrich / @tfriedr

Related issues:

Note: This issue was originally created as PARQUET-334. Please see the migration documentation for further details.

asfimport commented 9 years ago

Xiang Li: Update parquet-pig/src/main/java/parquet/pig/summary/Summary.java to yield a more clear stack trace.

java.lang.NullPointerException at parquet.pig.summary.Summary.setInputSchema(Summary.java:261) at org.apache.pig.newplan.logical.expression.ExpToPhyTranslationVisitor.visit(ExpToPhyTranslationVisitor.java:512) at org.apache.pig.newplan.logical.expression.UserFuncExpression.accept(UserFuncExpression.java:113) at org.apache.pig.newplan.ReverseDependencyOrderWalkerWOSeenChk.walk(ReverseDependencyOrderWalkerWOSeenChk.java:69) at org.apache.pig.newplan.logical.relational.LogToPhyTranslationVisitor.visit(LogToPhyTranslationVisitor.java:807) at org.apache.pig.newplan.logical.relational.LOForEach.accept(LOForEach.java:87) at org.apache.pig.newplan.DependencyOrderWalker.walk(DependencyOrderWalker.java:75) at org.apache.pig.newplan.PlanVisitor.visit(PlanVisitor.java:52) at org.apache.pig.backend.hadoop.executionengine.HExecutionEngine.compile(HExecutionEngine.java:260) at org.apache.pig.backend.hadoop.executionengine.HExecutionEngine.launchPig(HExecutionEngine.java:295) at org.apache.pig.PigServer.launchPlan(PigServer.java:1390) at org.apache.pig.PigServer.executeCompiledLogicalPlan(PigServer.java:1375) at org.apache.pig.PigServer.execute(PigServer.java:1364) at org.apache.pig.PigServer.access$500(PigServer.java:113) at org.apache.pig.PigServer$Graph.registerQuery(PigServer.java:1689) at org.apache.pig.PigServer.registerQuery(PigServer.java:623) at org.apache.pig.PigServer.registerQuery(PigServer.java:636) at parquet.pig.summary.TestSummary.testPigScript(TestSummary.java:139

asfimport commented 9 years ago

Xiang Li: In Pig code, src/org/apache/pig/EvalFunc.java. A private number "inputSchemaInternal" represent the schema. Setter and Getter are also provided

316     private Schema inputSchemaInternal=null;

328     /**
329      * This method is for internal use. It is called by Pig core in both front-end
330      * and back-end to setup the right input schema for EvalFunc
331      */
332     public void setInputSchema(Schema input){
333         this.inputSchemaInternal=input;
334     }
335 
336     /**
337      * This method is intended to be called by the user in {@link EvalFunc} to get the input
338      * schema of the EvalFunc
339      */
340     public Schema getInputSchema(){
341         return this.inputSchemaInternal;
342     }

But actually, they are overrided. In parquet-mr/parquet-pig/src/main/java/parquet/pig/summary/Summary.java, It uses a new number called inputSchema(vs. inputSchemaInternal) to represent schema and override setInputSchema(), but not override getInputSchema()

51  public class Summary extends EvalFunc<String> implements Algebraic {

54     private Schema inputSchema;

257   @Override
258   public void setInputSchema(Schema input) {
259     try {
260       // relation.bag.tuple
261       this.inputSchema=input.getField(0).schema.getField(0).schema;
262       saveSchemaToUDFContext();
263     } catch (FrontendException e) {
264       throw new RuntimeException("Usage: B = FOREACH (GROUP A ALL) GENERATE Summary(A); Can not get schema from " + input, e);
265     } catch (RuntimeException e) {
266       throw new RuntimeException("Usage: B = FOREACH (GROUP A ALL) GENERATE Summary(A); Can not get schema from "+input, e);
267     }
268   }
asfimport commented 9 years ago

Daniel Dai / @daijyc: Input Schema is maintained by Pig inside EvalFunc. No need to maintain this in Parquet side. Attach patch.

asfimport commented 9 years ago

Xiang Li: Thanks Daniel for taking care of this! +1 for the patch, more reasonable to fix it on Parquet side. UT passed on Parquet 1.8.0

Hi Julien, could you please give a review?

asfimport commented 8 years ago

Thomas Friedrich / @tfriedr: I updated the patch from Daniel and removed the private inputSchema variable and instead call the getInputSchema method of the parent class. Otherwise inputSchema was always null. @julienledem, can you please review my pull-request with the patch.

asfimport commented 8 years ago

Julien Le Dem / @julienledem: Issue resolved by pull request 292 https://github.com/apache/parquet-mr/pull/292

asfimport commented 8 years ago

Thomas Friedrich / @tfriedr: Thanks, @julienledem. Shouldn't the fix version be a parquet-mr release, not parquet-format?