carvel-dev / ytt

YAML templating tool that works on YAML structure instead of text
https://carvel.dev/ytt
Apache License 2.0
1.66k stars 136 forks source link

Command to validate a data values file #278

Open cari-lynn opened 3 years ago

cari-lynn commented 3 years ago

Describe the problem/challenge you have I have a schema file, and data values file, and some ytt templates. I want to run a command to validate that my data values adhere to my schema.

Describe the solution you'd like A command that gives feedback on whether or not my data values adhere to my schema file. If my data values are not compatible with my schema, the command fails and give me hints as to what is incorrect in my data values file. If the data values are compatible with my schema, then the command succeeds.

Anything else you would like to add: ytt is often used piped into other tools, so some information such as warnings may be lost when running the main ytt command ytt -f ....

This belongs as a separate command ytt validate so that the output of the command can provide additional details beyond what would be seen with ytt -f ... The additional details that validate could provide could be warnings when my data values file is compatible with my schema, but the data values are duplicated, or otherwise not recommended. For example:

schema.yml

#@schema/definition name="data/values"
foo: "apple"

datavalues.yml

#@data/values
foo: "apple"

I see a warning like: Warning: data value 'foo' was set to the same value '"apple"' more than once. You may remove this data value

StevenLocke commented 3 years ago

If my data values are not compatible with my schema, the command fails and give me hints as to what is incorrect in my data values file. If the data values are compatible with my schema, then the command succeeds.

I think this is a hard requirement of the templating command with a schema. If supplied values are not acceptable, we should fail and (hopefully nicely) explain where the issue lies. If they are compatible, then the command succeeds. How would the validate command modify that contract? Is it to ensure that warnings don't get ignored? Maybe the template command should be firmer with errors and fail early.

...but the data values are duplicated...

I'm not sure I'd want to encourage removing duplicates from the data values, though open to hearing thoughts from others. If the schema ever changes, keeping the dupe would ensure that the value doesn't change from what you put in, though I can also see the case where you'd want the updated value from the schema. Hmmm, I want to discuss this more.

...or otherwise not recommended

What other conditions could not be recommended? I'm trying to envision a way to denote "valid but weird" as opposed to "invalid" values in the schema. Maybe there's a way to "suggest" a value, but allow it to be templated? Is that different from a default?

Edit: From a conversation with @joaopapereira maybe a Config Author would want to provide a default of a single instance for a deployment, but highly encourages the Consumer to set it to be more appropriate for their use case. That's a meaningful "not recommended" value for me.

cari-lynn commented 3 years ago

I think this is a hard requirement of the templating command with a schema

I think this purposed feature wouldn't modify the contract for the templating command. As a user I just want a way to get more validation that my data values are working as I intended. Alternatively, rather than a separate command, I suppose this could modify the template command with a flag like --fail-on-warnings.

Another use case may be to provide a warning for unused data values. ytt allows data values to be provided, but not used. This may be an indication that data value was meant to be used, but the value accidentally was hardcoded in a template.

danielhelfand commented 3 years ago

One interesting thing that I think could come out of this is some kind of ytt lint approach. Something that warns users about better practices with ytt. I think the use case above of the schema/data values duplicate is something that could be caught.

Another use case I could see is removing the append annotation from each array item if a user is still appending each item after #269 is implemented.

It could be a way to suggest an opinionated way of working with ytt that could help improve users' understanding.

pivotaljohn commented 3 years ago

What do you think of the idea of including messaging like this when the user employs one of the --...inspect flags?

danielhelfand commented 3 years ago

What do you think of the idea of including messaging like this when the user employs one of the --...inspect flags?

Currently the flag (--inspect-data-values) just lists data values. Originally reading this issue, my first thought would be to look there for if this functionality already exists.

It would make the flag more closely align with its current name in my opinion. Maybe help separate cases of data values versus other use cases.

StevenLocke commented 3 years ago

My understanding of the original issue is to validate a set of data_values against a schema and provide helpful hints for improving formatting. That would feel pretty strange to me in the --data-values-inspect output which I see more as a "what does ytt have access to," something like debug information, and less like "user info".

I love the --fail-on-warnings flag suggestion. That feels like a nice approach to me and would encourage us to add warnings for things we want users to change, like "key: abc (data_values.yml:5) is unused in your template" "Duplicate key in the schema (schema.yml:10) and data values (dv.yml:100)" "Array items no longer need to be annotated with #@overlay/append (dv.yml:25)" [soonTM]

Plus, that gives users who aren't going to be looking for how to improve their ytt skills the messages without needing them to search for another command, and the warnings can be safely ignored if they don't care.

We (@DennisDenuto and I) are going to accept this issue, now it's just a consensus of implementation problem.

cppforlife commented 3 years ago

That would feel pretty strange to me in the --data-values-inspect output which I see more as a "what does ytt have access to," something like debug information, and less like "user info".

--data-values-inspect was added to show what data values are immediately before they get used in templates. the only way to get access to "final" data values is to go through validation of them against schema, and merging them with data values split over multiple documents/files. this happens to address this issue request of "validate that my data values adhere to my schema."

I love the --fail-on-warnings flag suggestion. That feels like a nice approach to me and would encourage us to add warnings for things we want users to change, like

im not sure warning is a right term for Array items no longer need to be annotated. it would be good to have an issue/document that lists various scenarios we may want to alert on. i think collecting them (over few months?) and seeing all together will provide more clarity on classification and whether they should be bunched up or not.