Closed juditnovak closed 2 months ago
I'm doing my best to reproduce the issue for deeper understanding of the original cause, but I'm struggling to get the failure again.
main
doesn't trigger it anymore@Mehdi-Bendriss your suggestions (overall a new version of the original proposal) have been all committed.
I'm confused why you've proposed this code: I do not undestand underlying workflows (further than the instuctions I've been following from earlier comment https://github.com/canonical/opensearch-operator/pull/412#discussion_r1729248248). I'm interested to learn about them (so next time my contribution could be more useful than it seems to be here :-( )
Your proposal introduces a circular import, as ClusterTopology
itself internally uses OpenSearchDistributon
(the class we are modifying).
Do you have another proposal? Would you like to give highlights of underlying logic, so I could have one?
Summarizing offline discussion and consequences:
@Mehdi-Bendriss suggested to use keep the circular import. Implemented in commit https://github.com/canonical/opensearch-operator/pull/412/commits/f282ef4aeec756f2cfa4107df7423e89d83f0b97 Integration tests show no issues, on a superficial check the disabled config unittest may. Corresponding unittests disabled, see error in comment.
I believe that we should avoid the circular imports and keep clear integrity of the ClusterTopology
object relying on the OpenSearchDistribution
as suggested by the original design.
By moving the hard-coded list of default roles into an application-wide constant, this is simple to do. Note that this solution had NO interference with any tests (which should be a signal of a healthy internal refactor).
This proposed solution is added in commit https://github.com/canonical/opensearch-operator/pull/412/commits/d227aa9c48e540edc6b86a492cd15ae9962833e6. (In case circular import is preferred, we just need to roll back this commit.)
Question (@Mehdi-Bendriss, @reneradoi , @phvalguima ):
The current solution is ignoring the Deployment Description stored in the databag if the the application was NOT started with WITH_PROVIDED_ROLES
. I understand that this is how the application is normally configured.
Question: is it possible that throughout runtime of the application the Deployment Description roles (stored in the databag) may change?
If this scenario is impossible, I'm changing the corresponding new unittest. Thank you.
Summarizing offline discussion: Deployment Description is holding the "desired state", not the current state. The current state is available in the static config. (This also explains why it's safer to back to Deployment Description defaults rather than actual values, in case static config may not be available.)
All this wisdom is "re-cycled" into the code, in the shape of helpful comments around the workflow :-)
Issue
https://github.com/canonical/opensearch-operator/issues/222
Context (as highlighted in the issue comment):
"When a unit is online, unit-specific service information is taken from the Opensearch API (
/_nodes
).However if the unit is offline, we fall back to static configuration. This is the part where we had the issue, in case
node.roles
was missing from the configuration."Solution
We shouldn't directly refer to a field that may be missing from the config.
NOTE: Since the
opensearch_distro
had no unittests, this PR is also adding a new unttest module dedicated toopensearch_distro
. For the time being only the concernedcurrent()
method is being tested (both for happy and broken path -- the latter was the target of the fix.) The test module should be extended in the future for additional test coverage for theopensearch_distro
module.NOTE2: The PR is including a global fix for Opensearch Unit Tests, as a global mock was applied (in a hidden corner) imacting ALL tests.