InseeFr / Trevas

Transformation engine and validator for statistics.
MIT License
10 stars 5 forks source link

substring #327

Open NicoLaval opened 4 months ago

NicoLaval commented 4 months ago

@noahboerger you reported substr is not supported.

I thought it was, for scalars, components and datasets.

Could you report a failed example please?

NicoLaval commented 4 months ago

Substring syntax (In Trevas param2 is the start_index, and param3 the length. The user test seems to assume, that param2 is the start_index and param3 the end_index (inclusive))

@noahboerger could you confirm substr doesn't cause exception throw but a wrong result because of the previous remark?

noahboerger commented 4 months ago

The first note on substr was just that it is currently not supported in the shape DS_r := substr( DS_1, 1, 2).

The different syntax also seems to be the correct one according to the reference manual (p. 76 f.). So i would propose to close the issue as it is only an issue for our used testcases.

NicoLaval commented 4 months ago

This syntax has to work, and is implemented in Trevas I think. The condition on the handled dataset is: dataset with only string measures.

@noahboerger, can you provide a failed example with the stack trace please?

noahboerger commented 4 months ago

@NicoLaval you are right, with a dataset with only string measures it is working also on my side. The dataset i have tested it with had the wrong format

Id_1 Id_2 num_1 Text_1
IDENTIFIER IDENTIFIER MEASURE MEASURE
STRING STRING DOUBLE STRING

The concrete test case i have executed is the one from BdI "11. conditional/if-then-else_2", which is trying to execute it with a dataset that is also containing a measure with an additional double type. This testcase has failed with the following error:

Occured error ### Exception ``` fr.insee.vtl.engine.exceptions.FunctionNotFoundException: function 'substr(Double, Long, Long)' not found at fr.insee.vtl.engine.visitors.expression.functions.GenericFunctionsVisitor.invokeFunction(GenericFunctionsVisitor.java:146) at fr.insee.vtl.engine.visitors.expression.functions.StringFunctionsVisitor.visitSubstrAtom(StringFunctionsVisitor.java:183) at fr.insee.vtl.engine.visitors.expression.functions.StringFunctionsVisitor.visitSubstrAtom(StringFunctionsVisitor.java:20) at fr.insee.vtl.parser.VtlParser$SubstrAtomContext.accept(VtlParser.java:3074) at org.antlr.v4.runtime.tree.AbstractParseTreeVisitor.visit(AbstractParseTreeVisitor.java:18) at fr.insee.vtl.engine.visitors.expression.ExpressionVisitor.visitStringFunctions(ExpressionVisitor.java:254) at fr.insee.vtl.engine.visitors.expression.ExpressionVisitor.visitStringFunctions(ExpressionVisitor.java:41) at fr.insee.vtl.parser.VtlParser$StringFunctionsContext.accept(VtlParser.java:1098) at org.antlr.v4.runtime.tree.AbstractParseTreeVisitor.visitChildren(AbstractParseTreeVisitor.java:46) at fr.insee.vtl.parser.VtlBaseVisitor.visitFunctionsExpression(VtlBaseVisitor.java:90) at fr.insee.vtl.engine.visitors.expression.ExpressionVisitor.visitFunctionsExpression(ExpressionVisitor.java:491) at fr.insee.vtl.engine.visitors.expression.ExpressionVisitor.visitFunctionsExpression(ExpressionVisitor.java:41) at fr.insee.vtl.parser.VtlParser$FunctionsExpressionContext.accept(VtlParser.java:629) at org.antlr.v4.runtime.tree.AbstractParseTreeVisitor.visit(AbstractParseTreeVisitor.java:18) at fr.insee.vtl.engine.visitors.AssignmentVisitor.visitAssignment(AssignmentVisitor.java:51) at fr.insee.vtl.engine.visitors.AssignmentVisitor.visitTemporaryAssignment(AssignmentVisitor.java:59) at fr.insee.vtl.parser.VtlParser$TemporaryAssignmentContext.accept(VtlParser.java:372) at org.antlr.v4.runtime.tree.AbstractParseTreeVisitor.visit(AbstractParseTreeVisitor.java:18) at fr.insee.vtl.engine.VtlScriptEngine.evalStream(VtlScriptEngine.java:263) at fr.insee.vtl.engine.VtlScriptEngine.eval(VtlScriptEngine.java:282) at java.scripting/javax.script.AbstractScriptEngine.eval(AbstractScriptEngine.java:262) at fr.insee.trevas.jupyter.VtlKernel.eval(VtlKernel.java:305) at io.github.spencerpark.jupyter.kernel.BaseKernel.handleExecuteRequest(BaseKernel.java:334) at io.github.spencerpark.jupyter.channels.ShellChannel.lambda$bind$0(ShellChannel.java:64) at io.github.spencerpark.jupyter.channels.Loop.lambda$new$0(Loop.java:21) at io.github.spencerpark.jupyter.channels.Loop.run(Loop.java:78) ```

The test from BdI assumes that the additional non string measure is kept without being manipulated. But I think your implementation is right, as the table of VTL-ML-Operators in the reference manual (p. 17), requires the input to be in the following form:

 op ::
dataset { measure<string> _+ }
| component<string>
| string
pattern1, pattern2 ::
component<string>
| string

so i think it the correct behaviour to fail here.

I did not dive so deep into this test case when i executed it the first time, but then it is a wrong test case. I am not sure if the trevas error message could be a bit more clear, as the engine still tries to execute the replace-operator even with a mismatching input dataset.