Open jorgemoralespou opened 2 years ago
related vendir side of things for first option: https://github.com/vmware-tanzu/carvel-vendir/issues/25. it would be very useful to collect some concrete examples (e.g. what resources would we be fetching, under what conditions, how would they be used, ...).
A semi related thing would be injecting as YAML data files using data file-marks which lists the installed set of packages so can determine what is installed so you might fail if something required isn't installed. A list of available CRDs could also be helpful in that respect, although presence of the CRD doesn't necessarily mean the operator is still installed as CRDs are often not deleted when operators are deleted as doing so can cause issues in trying to clean things up due to ordering, so left behind.
Note, Jorge said supplying them as data values, but it is probably more appropriate as data files using data file-mark. Thus kapp-controller could inject various things about state of cluster, including installed packages, type of ingress, Tanzu capabilities etc. Using data file-mark these wouldn't interfere with normal processing as a package would need to opt in to trying to read them using starlark code.
BTW, why isn't there a ytt comment directive one can put in an arbitrary YAML file to apply the data file-mark. The only way seems to be to use command line options which can be a bit inconvenient. In some use cases it would be more convenient to just have it exist in the same directory as other files, just like one can have data values and schema files with their respective comment directive. Another useful option would be that an extension of .data.yml
or .data.yaml
would automatically mean it would have the data file-mark and thus be ignored unless in starlark code you use struct.encode(yaml.decode(data.read(...)))
.
We brought this capability up a few ages ago with Cluster API. The API Load Balancer DNS name isn't known until after cluster creation, and it lives at k get awscluster CLUSTER_NAME -o jsonpath={.spec.controlPlaneEndpoint.host}
I would want to pass this value into another App CR so that I could output a DNS Endpoint for External DNS.
- apiVersion: v1
kind: AWSCluster
namespace: foo
fieldSelector: "{.spec.controlPlaneEndpoint.host}"
labelSelector:
matchLabels:
cluster.x-k8s.io/cluster-name: bar
path: svc.yml
prefix: my_cluster_endpoint
I'm not sure what the value(s) would get passed into the resulting template step -- would I have a file that looked like this?
#@data/values
---
my_cluster_endpoint:
spec:
controlPlaneEndpoint:
host: controlplane.example.com
A list of available CRDs could also be helpful in that respect, although presence of the CRD doesn't necessarily mean the operator is still installed as CRDs are often not deleted when operators are deleted as doing so can cause issues in trying to clean things up due to ordering, so left behind.
to determine available apis, it's probably better to list available k8s apis (not concretely crds). it's not direct k8s resource listing i think, but still could be coming into templates via same fetching api.
Note, Jorge said supplying them as data values, but it is probably more appropriate as data files using data file-mark. Thus kapp-controller could inject various things about state of cluster, including installed packages, type of ingress, Tanzu capabilities etc. Using data file-mark these wouldn't interfere with normal processing as a package would need to opt in to trying to read them using starlark code.
that's where my head is at as well. i havent thought too hard about other tools integration though. e.g. can helm templates receive this info as well? we could in theory make this configurable how data is passed in. there is something to be said about how other data might be coming in as well (e.g. passing in app cr namepace/name info via configured data values)
BTW, why isn't there a ytt comment directive one can put in an arbitrary YAML file to apply the data file-mark.
mostly because typical use case is using file-mark on third party file so cant/shouldnt modify them. we havent seen many use cases for using file mark for your own files. (also we dont have file-scoped annotations, max scope is single yaml document).
aside from allowing package cr to configure this, i do like @jorgemoralespou's suggestion on making it possible via values refs (pretty interesting idea...):
values:
- secretRef:
name: package-values
- resourceRef:
name: contour
kind: Deployment
namespace: projectcontour
though this may have to be somehow mapped to data values... to not leak out internal Package CR configuration to the user. e.g.
values:
- secretRef:
name: package-values
- key: contour.resource
resourceRef:
name: contour
kind: Deployment
namespace: projectcontour
definitely have to think about how this ties together with file based interfaces...
The most important task is already happening, and it's that it got you thinking :-D
I also like the fact that a resourceRef can be mapped to a data values. I think somehow it was in my mind, but obviously there's a need to reference the obtained resource as a data values key.
So is the idea that the file would be passed into something like ytt as a data file, and you would need to access it like:
- apiVersion: v1
kind: AWSCluster
namespace: foo
labelSelector:
matchLabels:
cluster.x-k8s.io/cluster-name: bar
path: stuff-goes-here.yml
# ...
#@ cluster = yaml.encode(data.read("stuff-goes-here.yml"))
#@ endpoint = cluster.spec.controlPlaneEndpoint.host
# ...
Let's throw another example here, when working with a non-dns enabled environment (maybe testing or other reason), you need your ingress controller to create a LoadBalancer service (with an externalIP) that you can then use via nip.io
or sslip.io
or any other similar service to create Ingress resources to your applications. An example of how this could work would be
apiVersion: packaging.carvel.dev/v1alpha1
kind: PackageInstall
metadata:
name: my-application
spec:
serviceAccountName: my-application-sa
packageRef:
refName: my-application.example.com
versionSelection:
constraints: ">v1.0.0"
prereleases: {}
values:
- secretRef:
name: my-application-values
- resourceRef:
key: domain.ip
name: envoy
kind: Service
namespace: projectcontour
selector: status.loadBalancer.ingress.ip
And then have in your overlays a definition similar to this:
#@ domain = data.values.domain_name
#@ if/end data.values.domain.ip:
#@ domain = "{}.sslip.io".format(data.values.domain.ip)
NOTE: Probably there's a better version to that ytt condition :-D
By the way, this reminds me somehow of the downward API
Whether to project the whole resource as a data values file, or just a specific value via a selector is fine with me.
@GrahamDumpleton noted:
Using data file-mark these wouldn't interfere with normal processing as a package would need to opt in to trying to read them using starlark code.
@cppforlife piled on:
that's where my head is at as well. i havent thought too hard about other tools integration though. e.g. can helm templates receive this info as well? we could in theory make this configurable how data is passed in. there is something to be said about how other data might be coming in as well (e.g. passing in app cr namepace/name info via configured data values)
When first starting down this thread, I was thinking that such data could be plopped under a well-defined key within the data values tree (e.g. data.values.cluster_state
). It would reduce any motion around using that data: it's parsed as YAML, already. As Schema features mature, there can be guarantees about presence or defaults.
What are the benefits of pushing this data through the data
filetype channel, instead?
What are the benefits of pushing this data through the data filetype channel, instead?
I don't think there are any, I was just having some trouble following along at first :blush:
I have concerns over being able to map in arbitrary resources, because unless it is evaluated against service account RBAC could it mean someone could get access to contents of secrets they shouldn't.
As to using data file mark I saw that as preferable to a data value as the latter means a data schema has to be specifically adjusted to expect the latter else would fail. So you couldn't arbitrarily always inject any base configuration values about cluster or packages installed etc. Right now as per https://github.com/vmware-tanzu/carvel-ytt/issues/515 you can't readily say to expect unknown keys properly.
More concrete examples I do use:
These are some various examples of things we have to tweak manually but would happily automate via this feature. Hope it helps.
I have concerns over being able to map in arbitrary resources, because unless it is evaluated against service account RBAC could it mean someone could get access to contents of secrets they shouldn't.
yeah it would be using App CR's spec.serviceAccount or spec.cluster.kubeconfigSecretRef to access any from the cluster.
Right now as per vmware-tanzu/carvel-ytt#515 you can't readily say to expect unknown keys properly.
i think that only applies if you want to mix known and unknown keys but should work fine for dedicated section that's marked as any=True.
to determine available apis, it's probably better to list available k8s apis (not concretely crds). it's not direct k8s resource listing i think, but still could be coming into templates via same fetching api.
Having a list of APIs gives even less guarantees than a list of CRDs. A company could use a single API group based on their company domain which spans resources created by more than one operator. So like with CRDs it doesn't actually tell you a specific operator may be installed at that point.
Having a list of APIs gives even less guarantees than a list of CRDs. A company could use a single API group based on their company domain which spans resources created by more than one operator.
i am referring to APIs available from k8s endpoints (eg kubectl api-resources
), not API Services
.
A huge one that I missed:
If I ruminate on this what I really desire is: The capability to give an expression that can be evaluated prior to applying. The evaluation of that expression is typically some data that resides in existing cluster resources whose output of that expression is then used as input to the current package.
The problem will really be about how to evaluate that expression in a way that makes sense and is bounded.
If I ruminate on this what I really desire is: The capability to give an expression that can be evaluated prior to applying. The evaluation of that expression is typically some data that resides in existing cluster resources whose output of that expression is then used as input to the current package.
Is it an oversimplification to characterize this as "simply" querying the cluster as if it were merely a document db?
I ask because I think that characterization bounds the operation so that:
Are there other data sources and/or types involved?
FWIW, we have been researching how this could be done of late and believe a lot can be learned from what kyverno does. In particular kyverno has the concept of defining contexts, which are additional sets of data values that are available for use in processing kyverno rules. One source of data values for a context is an apiCall.
Eg., they have:
rules:
- name: example-api-call
context:
- name: podCount
apiCall:
urlPath: "/api/v1/namespaces/{{request.namespace}}/pods"
jmesPath: "items | length(@)"
Here you can see how they allow a urlPath to be specified for a call made against the cluster with an optional jmesPath expression being used to generate a processed result from it. In other words, not restricted to just the raw result.
In the context of ytt, what could be done is that for each context you create a file with name {{context-name}}.data.yaml
and set a file for that file name like --file-mark '{{context-name}}.data.yaml:type=data'
. Thus the files aren't by default processed but the main YAML files could then use data.read()
to load the data for the specific context and process it.
As to the jmesPath expression, since starlark is rich enough as is, you need not bother with that and always save the raw data for processing by the ytt templates.
As to the variable interpolation in the urlPath, then kapp-controller should make a range of variables available corresponding to the namespace the package is tracked in, but also any input data values file should be able to referenced, thus allowing one to use anything from that to deduce names of a namespace or a specific resource in the namespace or at cluster scope.
One thing to note about kyverno, you do have to give it additional roles to read the resources it may accumulate via an apiCall, it can't just access the whole cluster by default. This would be an issue with kapp-controller as well, as allowing full access means it would be trivial to steal secrets. Thus RBAC has to be taken into consideration and how you review additional RBAC needed because of this for a package and only enable it just for that package after you verify you want to do that. But then this is where the ability to specify a service account to use for the deployment comes in.
I would just start simple, and maybe, if needed, iterate from there. Right now, being just able to load the whole document/resource specified by a selector (gvk+name+ns) from the cluster would improve kapp-controller greatly.
Basically what @pivotaljohn describes in his comment
another use case, this may or may not fall under this issue, but it seems at least tangential:
JWTAuthenticator
resource with fields we want to set to 1) the address of a Service
in the package and 2) the CA bundle from a Certificate
in the package (the Service
sits in front of an OIDC provider Deployment
, the Certificate
is the serving cert fo the OIDC provider, and the JWTAuthenticator
configures another component to use the OIDC provider to validate JWTs)to me, this use case is slightly different than the ones mentioned above because it wants to consume status
on resources from the same package for template values. i feel like i am asking for a second templating step after deploying some of the resources in the package, which seems to break the existing fetch
/template
/deploy
pipeline abstraction. hence i'm wondering if there is a better way to resolve my use case
Describe the problem/challenge you have I need as part of my packages processing to query information that lives in the cluster so that I can take some decisions. An example would be that my package would create HTTPProxy resources if contour is available in the cluster, otherwise it would use Ingress. Currently, the user needs to query for that information and provide it as a data value to the instantiation of a PackageInstall (via de data values secret)
Describe the solution you'd like We have been thinking of 2 possible ways of achieving this:
Adding a fetch capabillity that will query resources from the cluster and that will use as data.values:
There's some dependency work in progress, but this would get specific details about the existing resources so that information can be used as data values.
The other option would be to extend the PackageInstall:
This are just to possible implementation options to address the feature, but it's really needed to be able to query existing resources in the cluster for data-values.
cc/ @grahamdumpleton
Vote on this request
This is an invitation to the community to vote on issues, to help us prioritize our backlog. Use the "smiley face" up to the right of this comment to vote.
👍 "I would like to see this addressed as soon as possible" 👎 "There are other more important things to focus on right now"
We are also happy to receive and review Pull Requests if you want to help working on this issue.