Closed muvaf closed 2 years ago
With https://github.com/crossplane/crossplane-runtime/pull/283 , generic reconciler doesn't call Create
even if we say the resource does not exist for given grace period. I've reduced the grace period to 0 but then it re-queues immediately after every Create
call until we report that resource exists since it assumes that you'd never call Create
again. This causes excessive re-queueing until the resource is ready, which could take more than a minute.
Until we lift the requirement that the controller needs to call Apply
again to get the results, we can't update to the new controller-runtime. Observe
should be able to say that the resource exists until the creation is completed, similar to RDS DB Instance and other resources whose creation is a long-running operation.
I agree that we should simplify the flow considering well-known environment we know our controllers running:
So, I don't see much benefit in trying to keep things synced with filesystem locks.
Since it'll affect our planning for the next sprint, I created https://github.com/crossplane-contrib/terrajet/issues/74 after getting the thumbs-up from @turkenh
Closing this, let's add comments to https://github.com/crossplane-contrib/terrajet/issues/74 for further discussion.
What problem are you facing?
During the initial implementation, in order to move fast and see our limits we deferred some of the discussion about the implementation and how we should do the bookkeeping was one of them. Currently, we manage the state and bookkeeping using filesystem. However, we also interact with Terraform's lock and state files, which makes the code complex to understand for readers and we don't really need to keep that information out of memory since the crash of the process means a restart of the
Pod
anyway. Additionally, there are some opportunities to merge some of the functionality in the code to get a simpler flow that is easier to read and change.How could Terrajet help solve your problem?
I suggest that we use a global map that does the bookkeeping of the running CLI executions. Roughly, I think we need the following improvements in the initial implementation. Please feel free to add more things that we had deferred to after PoC phase. cc @ulucinar @turkenh
map[string]Workspace
whose key ismetadata.uuid
.Workspace
, should update the map before it starts and after it completes, similar to today.Describe
method on thatWorkspace
object that will tell you the current status, i.e.creating
,destroying
orno-op
.apply
pipeline to be run repeatedly to check the status. But no other provider does that except some of Azure APIs, which is not user friendly at all in a controller, so we run into possibility of violating generic reconciler assumptions.Create
repeatedly because we need to make the sameApply
call to finalize it but since https://github.com/crossplane/crossplane-runtime/pull/283 , this won't be allowed because generic reconciler won't callCreate
again in the grace period.map[string]Workspace
can be treated just like a cloud provider SDK client. We don't need to know Terraform CLI, pipeline/operation details or propagating status as error etc. We can usemap[string]interface{}
to pass down spec configuration and receiveStateV4
and information about whether it'screating
ordestroying
.ExternalClient
implementation directly call a single interface. Because if you look at it today, it's merely a shim on top of conversion layer. Since we use similar names, it makes it really hard to follow the layers.ApplyResult
part, reporting current status should be the job ofDescribe
) which is very similar to the set of functions we use in native providerExternalClient
implementations.Note that some of the technical debt is accrued on purpose because we needed to be able to work in parallel during PoC and discover things from scratch. Before declaring terrajet as prod-ready and solidify the implementation with more tests, I think it's time to revisit the parts that we completed during that phase and refactor into a more cohesive structure because thanks to that initial effort we know what works, how Terraform behaves and the rough performance implications of the decisions.