InseeFr / Trevas

Transformation engine and validator for statistics.
MIT License
9 stars 5 forks source link

Key projection #336

Open NicoLaval opened 1 month ago

NicoLaval commented 1 month ago

@noahboerger you reported when a dataset is project to a part of its key the operation fails in Trevas because of a duplicate key column.

Could you provide an example please?

noahboerger commented 1 month ago

This issue is related to the membership operator (#). The reference manual explains the behaviour in the following part (p. 56):

The membership operator returns a Data Set having the same Identifier Components of ds and a single Measure. If comp is a Measure in ds, then comp is maintained in the result while all other Measures are dropped. If comp is an Identifier or an Attribute Component in ds, then all the existing Measures of ds are dropped in the result and a new Measure is added. A default conventional name is assigned to the new Measure depending on its type: for example num_var if the Measure is numeric, string_var if it is string and so on (the default name can be renamed through the rename operator if needed).

When ds1 is

id_1 id_2 val
IDENTIFIER IDENTIFIER MEASURE
INTEGER INTEGER INTEGER
1 2 3
4 5 6

The result of ds2 := ds1#id_2; should be out of my point of view

id_1 id_2 num_var
IDENTIFIER IDENTIFIER MEASURE
INTEGER INTEGER INTEGER
1 2 2
4 5 5

Another example for this is the BdI testcase "general/membership_1".

In trevas currently the following error is raised:

Occured error ### Exception ``` java.lang.IllegalArgumentException: duplicate column [Component{id_2, type=class java.lang.Long, role=IDENTIFIER}] at fr.insee.vtl.model.Structured$DataStructure.(Structured.java:275) at fr.insee.vtl.spark.SparkDataset.fromSparkSchema(SparkDataset.java:158) at fr.insee.vtl.spark.SparkDataset.(SparkDataset.java:54) at fr.insee.vtl.spark.SparkProcessingEngine.executeProject(SparkProcessingEngine.java:298) at fr.insee.vtl.engine.visitors.expression.ExpressionVisitor.visitMembershipExpr(ExpressionVisitor.java:140) at fr.insee.vtl.engine.visitors.expression.ExpressionVisitor.visitMembershipExpr(ExpressionVisitor.java:41) at fr.insee.vtl.parser.VtlParser$MembershipExprContext.accept(VtlParser.java:501) at org.antlr.v4.runtime.tree.AbstractParseTreeVisitor.visit(AbstractParseTreeVisitor.java:18) at fr.insee.vtl.engine.visitors.AssignmentVisitor.visitAssignment(AssignmentVisitor.java:51) at fr.insee.vtl.engine.visitors.AssignmentVisitor.visitTemporaryAssignment(AssignmentVisitor.java:59) at fr.insee.vtl.parser.VtlParser$TemporaryAssignmentContext.accept(VtlParser.java:372) at org.antlr.v4.runtime.tree.AbstractParseTreeVisitor.visit(AbstractParseTreeVisitor.java:18) at fr.insee.vtl.engine.VtlScriptEngine.evalStream(VtlScriptEngine.java:263) at fr.insee.vtl.engine.VtlScriptEngine.eval(VtlScriptEngine.java:282) at java.scripting/javax.script.AbstractScriptEngine.eval(AbstractScriptEngine.java:262) at fr.insee.trevas.jupyter.VtlKernel.eval(VtlKernel.java:305) at io.github.spencerpark.jupyter.kernel.BaseKernel.handleExecuteRequest(BaseKernel.java:334) at io.github.spencerpark.jupyter.channels.ShellChannel.lambda$bind$0(ShellChannel.java:64) at io.github.spencerpark.jupyter.channels.Loop.lambda$new$0(Loop.java:21) at io.github.spencerpark.jupyter.channels.Loop.run(Loop.java:78) ```