MicrosoftDocs / azure-docs

Open source documentation of Microsoft Azure
https://docs.microsoft.com/azure
Creative Commons Attribution 4.0 International
10.25k stars 21.42k forks source link

Compatibility between expression language #52806

Closed RobMcZagBDS closed 4 years ago

RobMcZagBDS commented 4 years ago

Hi, first I would like to say that in general the parameter passing using two different language is complicated to understand and the lack of examples (especially beyond the trivial myParam = 'myValue') makes difficult and frustrating to work with this product.

It would be nice to have more and more complex examples, so that one can better pick up how the mechanism work.

Trivial problem I am facing: I would love to pass the pipeline starting time as default for a parameter in a data flow, so I can override it when running the pipeline should a re-run be needed.

It looks like I can pass a timestamp "@pipeline().TriggerTime" from the pipeline expression (even if the param is not a String), but then the pipeline errors out "Column 'Sandbox' not found. The stream is either not connected or column is unavailable" (PIC Available).

So I fail to see a simple way of doing this even if it should be pretty simple... any input?


Document Details

Do not edit this section. It is required for docs.microsoft.com ➟ GitHub issue linking.

ChiragMishra-MSFT commented 4 years ago

@RobMcZagBDS Thanks for the feedback. I am really sorry for the inconvenience. I am assigning the issue to the content owner to evaluate and update as required.

@kromerm Can you please take a look at this?

kromerm commented 4 years ago

try passing it into a string data flow parameter

RobMcZagBDS commented 4 years ago

@kromerm who are you suggesting to? Me or @ChiragMishra-MSFT ?

In the last 3 days I had to figure out that the only way to pass things between pipeline and data flow is using strings. Everything else just brings problems. So it should not even be possible. :)

Another source of confusion to me was the quoting. Inside the data flow every time you write a string literal you have to surround it with single quotes.... while in the pipeline expressions you have not, so if you add quotes... those are part of the string. :( VERY CONFUSING

Below is the error you get from your UI. Good luck finding out what is the problem with the time stamp string.


Error { "errorCode": "InvalidTemplate", "message": "In function 'formatDateTime', the value provided for date time string ''2019-12-01T08:00:00.000Z'' was not valid. The datetime string must match ISO 8601 format", "failureType": "UserError", "target": "Shifts", "details": "" }


Even more so when the error prints out '[your-string]' and your string is a single quoted string as then you get ''mySting value'' that looks exactly like what you wanted [myString value] in double quotes. So you start wondering why what you expect as a good value is not working. :(

HINT: the timestamp string is '2019-12-01T08:00:00.000Z' instead of 2019-12-01T08:00:00.000Z.

wildky commented 4 years ago

@RobMcZagBDS your issue and frustration around quoting (and lack of documentation) sounds very similar to mine. I haven't cracked the case on how to pass a string parameter from a pipeline to data flow using a pipeline expression. The documentation is confusing on this too.

In my case, the parameters being sent from pipeline to data flow look correct in the json input to the data flow, but the data flow errors out complaining about type mismatch when I try to use it in a data flow expression:

{"message":"at Source 'someSource'Parameter 'someParameter'(Line 272/Col 17): Incompatible data types between declared type and actual parameter value. ...failureType":"UserError","target":"some data flow trigger activity ","errorCode":"DFExecutorUserError"}

Bonus: this error is too vague to be much help..

RobMcZagBDS commented 4 years ago

@wildky it feels better when you are not alone ;)

My current understanding is that in the pipeline you do not need any quoting for literal strings, i.e. just type it out the string you want in the box. :)

If you have both the Pipeline parameter and the data flow parameter of type String... I would expect that the parameter passing would succeed. From the message you posted I would point to different types, but as you said... many times the error messages are less than useful.

Even if you have string type on both sides I can see issues coming if you have special characters (like single or double quotes) inside your string, if these do not get properly escaped during the parameter passing and get interpreted as delimiters in the data flow.

wildky commented 4 years ago

@RobMcZagBDS We figured it out with a little bit more testing of various cases. It is oh so simple in the end, as you describe, but it was a surprisingly confusing journey to get there

@kromerm One of the largest sources of confusion for my team was the lack of documentation about the "Expression" check box when passing a parameter from pipeline to data flow via a mapping data flow activity. We checked it because our parameter value was an expression, but this just made the syntax requirements for the value more difficult.

I think it would be helpful to update the documentation to describe when to use the expression check box, and also provide more examples to better understand how to pass a pipeline parameter to a data flow.

rishabh1896 commented 4 years ago

@RobMcZagBDS How did you solve the issue. I'm facing the same issue sending datetime from a file to data flow as '2020-05-05T16:00:00.000Z'. But it throws error file not found at path where as if i choose the same date from dataflow. it works just fine. Basically i'm trying passing @pipeline.TriggerTime Can you please help me with it?

RobMcZagBDS commented 4 years ago

@rishabh1896 In my case I had to stop quoting the timestamp literal in the parameter default. Try just to have 2020-05-05T16:00:00.000Z without any quoting.

rishabh1896 commented 4 years ago

@RobMcZagBDS I get "Parameter stream has parsing errors" as error for passing @pipeline.TriggerTime. Full error: Parameter stream has parsing errors\nLine 2 Position 25: missing '=' at ':'\nLine 2 Position 26: mismatched input '36' expecting {'parameters', 'functions', 'as', 'input', 'output', 'constant', 'expression', 'integer', 'short', 'long', 'double', 'float', 'decimal', 'boolean', 'timestamp', 'date', 'byte', 'binary', 'integral', 'number', 'fractional', 'any', IDENTIFIER, ANY_IDENTIFIER}\nLine 3 Position 0: extraneous input 'startTime' expecting '='. Details:Parameter stream has parsing errors\nLine 2 Position 25: missing '=' at ':'\nLine 2 Position 26: mismatched input '36' expecting {'parameters', 'functions', 'as', 'input', 'output', 'constant', 'expression', 'integer', 'short', 'long', 'double', 'float', 'decimal', 'boolean', 'timestamp', 'date', 'byte', 'binary', 'integral', 'number', 'fractional', 'any', IDENTIFIER, ANY_IDENTIFIER}\nLine 3 Position 0: extraneous input 'startTime' expecting '='","failureType":"UserError","target":"AzureToSqlDataFlow","errorCode":"DFExecutorUserError"} ltrim, suggestion (occurred 3 times)

RobMcZagBDS commented 4 years ago

@rishabh1896 sorry it looks like some error I had at some point, but I do not see any obvious cause to pint you to a solution.

A few questions to clarify your situation: Is the parameter a pipeline or dataflow parameter? Is it being passed as pipeline expression or as dataflow one? With or without expression checkbox? If you pass just 1 parameter with the

rishabh1896 commented 4 years ago

Screenshot 2020-04-29 at 4 25 38 PM @RobMcZagBDS attached a screenshot

RobMcZagBDS commented 4 years ago

@rishabh1896 one thing that I am sure is that timestamp format is different between pipeline and dataflow. In pipeline: 2020-05-05T16:00:00.000Z In dataflow: 2020-05-05 16:00:00 So you can not just pass and use a timestamp from its string representation. Inside the dataflow you have to take the string from the pipeline and convert it with dataflow functions.

Not yet sure of the working of the expression flag. My best guess is that it would trigger a second evaluation inside the dataflow. Have you tried to remove it so that the parameters are passed as text?

kromerm commented 4 years ago

The activity document has not been updated with the "Expression" checkbox update. @djpmsft The data flow activity docs need to reflect the Expression option that was added to the parameter passing to handle string interpolation automatically.

Previously, you have to use string interpolation manually to use pipeline expressions. Now, we do that for you, which is why we can now pass more than just string data types to data flows.

Regarding the time data type ... You are correct in that you have convert the timestamp once inside data flow. Data Flows are executing in a managed Spark environment. Pipelines execute in a Logic Apps .NET context, so there will be data type conversions in that value marshaling.

What I do is set the data flow param type to string and then use a Derived Column to convert to the date/time format I like.

@djpmsft Let's add that example of timestamp passing to the doc as well.

RobMcZagBDS commented 4 years ago

Thank you Mark @kromerm for the clarification.

From your words and some other issue discussion (that I cannot pinpoint now) I understand that when the receiving parameter is not of type string some conversion is being quietly attempted and in case of failure you would get a null (maybe 0 for ints). Is this correct? Can this be explicitly documented?

To me this implies that declaring a dataflow parameter of a type different than String is like requesting an automatic conversion. IF this would be well understood it will be possible to save a lot of frustration and also un-needed derived columns when the conversion is trivial. This also implies that the result of the pipeline evaluation MUST be a string that can be converted to the correct destination type with some parsing rule. A good case to be explicit about these conversion rules.

From your answer I now think that no double evaluation takes place and some pipeline expression that evaluates to the string toInteger("123") will generate an error if assigned to an integer parameter. Your pipeline expression should just produce the number part.

What would be the correct content of the string resulting from a pipeline expression to go into an int? (a) 123 (b) "123" or (c) '123' ? If quoting should be or not part of the result should be clarified.

I still do not fully grasp the need for that checkmark and it could be as I do not know .NET and Logic Apps. Is not possible to automatically check IF the result of the evaluation is not a string or not properly quoted and do the needful to ensure it is passed correctly to the Spark (Scala / Java) side?

And if needed as you clearly explained the "expression" checkmark is just a way for automatic quotation... why not rename it AutoQuote or QuoteResult?

djpmsft commented 4 years ago

Hey all,

Apologies for the confusing doc. We made some updates to clarify. If it wasn't clear, when referenced, pipeline parameters are evaluated and then their value is used in the data flow expression language.

I have also added an example of the behavior of passing in a pipeline parameter: https://docs.microsoft.com/en-us/azure/data-factory/parameters-data-flow#pipeline-parameter-example.

For information on what the 'Expression' checkbox does for string parameters, see https://docs.microsoft.com/en-us/azure/data-factory/parameters-data-flow#string-literals-vs-expressions

Let me know your thoughts! If it still isn't clear we would love your feedback.

djpmsft commented 4 years ago

As the doc update has been done, I am going to go ahead and close this issue. If you have further questions feel free to comment on this thread.

Thanks, Dan

please-close

RobMcZagBDS commented 4 years ago

@djpmsft Daniel, now it is way more clear, but I still not have 100% clear when expressions are evaluated (at passing time VS when referenced inside the data flow). I would also make explicit that expressions to be evaluated must be in DF language (99% sure).

I still believe that the most confounding case is also one of the most common: passing the pipeline start time and using it as a timestamp inside a DF.

IF I now understand correctly, it would be possible to pass the following pipeline expression into a data flow variable of type timestamp by checking the expression flag: toTimestamp(@pipeline().TriggerTime, "YYYY-MM-DDThh:mm:ss")

Would you add an example of how to pass the TriggerTime into a DF timestamp variable? Thank you, roberto

PS Note I just wrote the format string, so it might be wrong and even use the wrong type of quotes. That is another thing that might be better explained (maybe as a specific topic cross referenced by the two languages).

djpmsft commented 4 years ago

The pipeline expressions are evaluated at passing time. The data flow expressions are evaluated at run time. Expressions cannot cross over languages. I can add a timestamp specific example as well.

toTimestamp(@pipeline().TriggerTime, "YYYY-MM-DDThh:mm:ss") is not valid as toTimestamp is a data flow expression and trigger time is a pipeline expression.

You should pass in @pipeline().TriggerTime as a string or use string interpolation to pass in a pipeline expression that evaluates to the string you posted

To make this work pass in this string to a data flow parameter of type timestamp: "toTimestamp(@{pipeline().TriggerTime}, 'YYYY-MM-DDThh:mm:ss')"

Let me know if this works

djpmsft commented 4 years ago

Apologies for the confusion, I have tested and confirmed the below works:

toTimestamp(left('@{pipeline().TriggerTime}', 23), 'yyyy-MM-dd\'T\'HH:mm:ss.SSS')

The 'left' is required due to a limitation handling nanoseconds

djpmsft commented 4 years ago

Sorry for the spam, but updated the doc to include this

RobMcZagBDS commented 4 years ago

@djpmsft Daniel, personally I feel that any clarification is golden as these "little things" like having to use the {} or remove the nanos or the need (or lack of need) for double quotes around the expression are the ones that frustrate people trying to do something a bit more complex.

If you are not going to have some page explicitly dedicated to expression syntax examples are the next best thing.

And in the end it looks like to me that by passing a pipeline expression flagged as an expression you can get a double evaluation: at passing time as a pipeline expression producing a string and at run time as data flow expression returning the expected value. This can be powerful, but must be properly documented to not become frustrating.

I think that the doc is much more helpful now than it was before :) Thank you for the effort.