freeipa / freeipa-healthcheck

Check the health of a freeIPA installation
GNU General Public License v3.0
50 stars 28 forks source link

ipahealthcheck.ds.backends failure #163

Closed rcritten closed 3 years ago

rcritten commented 4 years ago

Cloned from https://pagure.io/freeipa/issue/8568

The ipa-healthcheck reported an error which caused PKI CI to fail: https://github.com/dogtagpki/pki/pull/3372/checks?check_run_id=1364905815 (click View raw logs)

2020-11-06T17:34:43.2456512Z   {
2020-11-06T17:34:43.2456822Z     "source": "ipahealthcheck.ds.backends",
2020-11-06T17:34:43.2457213Z     "check": "BackendsCheck",
2020-11-06T17:34:43.2457481Z     "result": "CRITICAL",
2020-11-06T17:34:43.2457941Z     "uuid": "34e473e8-4cc3-430e-b20c-e1b49d9a26a9",
2020-11-06T17:34:43.2458236Z     "when": "20201106173439Z",
2020-11-06T17:34:43.2458438Z     "duration": "0.016853",
2020-11-06T17:34:43.2458627Z     "kw": {
2020-11-06T17:34:43.2458934Z       "exception": "No object exists given the filter criteria cn=changelog"
2020-11-06T17:34:43.2459350Z     }
2020-11-06T17:34:43.2459514Z   },
rcritten commented 4 years ago

@edewata it's failing on Fedora 33. What version of python3-lib389 and freeipa-healthcheck are installed?

This error originates within python3-lib389.

cipherboy commented 4 years ago
2020-11-06T17:26:37.1339160Z Installing:
2020-11-06T17:26:37.1341071Z  freeipa-healthcheck               noarch  0.6-4.fc33                                fedora                                                                93 k
2020-11-06T17:26:37.1342332Z  freeipa-server                    x86_64  4.9.0.dev202011040826+git-0.fc33          copr:copr.fedorainfracloud.org:group_freeipa:freeipa-master-nightly  355 k
2020-11-06T17:26:37.1344375Z  freeipa-server-dns                noarch  4.9.0.dev202011040826+git-0.fc33          copr:copr.fedorainfracloud.org:group_freeipa:freeipa-master-nightly   39 k
2020-11-06T17:26:37.1345736Z  freeipa-server-trust-ad           x86_64  4.9.0.dev202011040826+git-0.fc33          copr:copr.fedorainfracloud.org:group_freeipa:freeipa-master-nightly  141 k
2020-11-06T17:26:37.1347201Z  python3-ipatests                  noarch  4.9.0.dev202011040826+git-0.fc33          copr:copr.fedorainfracloud.org:group_freeipa:freeipa-master-nightly  1.3 M

---

2020-11-06T17:26:37.1654873Z  python3-lib389                    noarch  1.4.4.7-1.fc33                            updates                                                              843 k
rcritten commented 4 years ago

It works for me with python3-lib389-1.4.4.5-1.fc33 but fails afterward.

@mreynolds389 looks like something in lib389.

rcritten commented 4 years ago

More specifically, it passes in 1.4.4.5 and fails in 1.4.4.7. I didn't test 1.4.4.6.

rcritten commented 4 years ago
Traceback (most recent call last):
  File /usr/lib/python3.9/site-packages/ipahealthcheck/ds/plugin.py, line 97, in doCheck
    results += result
  File /usr/lib/python3.9/site-packages/lib389/_mapped_object_lint.py, line 118, in lint
    yield from f()
  File /usr/lib/python3.9/site-packages/lib389/backend.py, line 516, in _lint_cl_trimming
    replica = replicas.get(suffix)
  File /usr/lib/python3.9/site-packages/lib389/replica.py, line 1765, in get
    replica = super(Replicas, self).get(selector, dn)
  File /usr/lib/python3.9/site-packages/lib389/_mapped_object.py, line 1103, in get
    raise ldap.NO_SUCH_OBJECT(No object exists given the filter criteria %s % selector)
ldap.NO_SUCH_OBJECT: No object exists given the filter criteria o=ipaca

It looks like it's doing this for each backend: o=ipaca, dc=example,dc=test and cn=changelog with only the last one actually being reported up within healthcheck

edewata commented 4 years ago

Could the ipa-healthcheck output be shortened so it's easier to see what's failing?

rcritten commented 4 years ago

Once 0.7 hits stable (soon) it will because it defaults to --failures-only. Or you can pass in --failures-only.

Mark has an idea what the problem may be.

rcritten commented 3 years ago

@edewata have you tried python3-lib389-1.4.4.8 to see if it addresses the failure? It works for me.

edewata commented 3 years ago

I have built 389-ds-base-1.4.4.8 in @pki/master COPR repo: https://copr.fedorainfracloud.org/coprs/g/pki/master/build/1771569/

Then restarted the CI tests: https://github.com/dogtagpki/pki/runs/1408488025

It's still failing:

2020-11-16T20:14:10.6903613Z  python3-lib389                    noarch  1.4.4.8-1.fc33                   copr:copr.fedorainfracloud.org:group_pki:master                      842 k
...
2020-11-16T20:22:24.8848135Z   {
2020-11-16T20:22:24.8848584Z     "source": "ipahealthcheck.ds.backends",
2020-11-16T20:22:24.8849126Z     "check": "BackendsCheck",
2020-11-16T20:22:24.8849486Z     "result": "CRITICAL",
2020-11-16T20:22:24.8850114Z     "uuid": "ca7f1862-18ee-44e1-8f48-4a5e70a2c03b",
2020-11-16T20:22:24.8850534Z     "when": "20201116202219Z",
2020-11-16T20:22:24.8850807Z     "duration": "0.021128",
2020-11-16T20:22:24.8851060Z     "kw": {
2020-11-16T20:22:24.8851504Z       "exception": "No object exists given the filter criteria cn=changelog"
2020-11-16T20:22:24.8851937Z     }
2020-11-16T20:22:24.8852144Z   },
rcritten commented 3 years ago

Ok, @mreynolds389 would you like me to create a 389 ticket to track this?

mreynolds389 commented 3 years ago

Ok, @mreynolds389 would you like me to create a 389 ticket to track this?

This does not look like a DS bug at this time. This error would suggest the mapping tree entry for cn=changelog was not created, or the the cn=changelog backend was not initialized. So the error is saying there is no backend, so this is most likely a config issue. Is there a system I can look at where this is failing so I can confirm what is really going on?

edewata commented 3 years ago

@rcritten Any idea about this? The test is defined in IPA, not in PKI: https://github.com/freeipa/freeipa/blob/master/ipatests/test_integration/test_ipahealthcheck.py#L992

PKI does not create cn=changelog. PKI does create cn=changelog5, but that operation no longer works with the latest DS, and the failure is ignored, so it should not have anything to do with ipa-healtcheck failure. See the following code: https://github.com/dogtagpki/pki/blob/master/base/server/src/com/netscape/cms/servlet/csadmin/LDAPConfigurator.java#L729-L768

rcritten commented 3 years ago

I'm not sure how the IPA integration test impacts this. You aren't running it AFAICT, you're running ipa-healthcheck directly, at least based on your PR.

This particular failure is from late November. Is it still happening? I wasn't able to reproduce it on a fresh F32 install.

edewata commented 3 years ago

We used to be calling ipa-healtchcheck directly in PKI CI right after running ipa-run-tests, but now we have temporarily disabled it because it kept failing with the error message reported above: https://github.com/dogtagpki/pki/blob/master/ci/ipa-test.sh#L68-L70

If IPA does not see this problem, maybe we should just remove the above lines permanently from PKI CI, and let ipa-healthcheck be tested in IPA instead of PKI. @cipherboy, what do you think?

rcritten commented 3 years ago

I'd at the very least suggest running it before the IPA tests. There is no assurance in the tests that the resulting server is in any sort of sane state (though that would be ideal).

flo-renaud commented 3 years ago

The issue is still happening but only when @389ds/389-ds-base-nightly copr repo is enabled. See for instance PR #625: test_replica_promotion_TestHiddenReplicaPromotion.

DS access logs shows that ipa-healthcheck is performing a search equivalent to ldapsearch -b "cn=mapping tree,cn=config" "(&(&(objectClass=nsds5Replica))(|(nsDS5ReplicaRoot=cn=changelog)))" that returns 0 result. IMO the issue is on DS side.

edewata commented 3 years ago

Just for the record, ipa-healthcheck is still failing with the same error even when it's run before the IPA tests.

mreynolds389 commented 3 years ago

This particular error ("No object exists given the filter criteria cn=changelog") was fixed in https://github.com/389ds/389-ds-base/issues/4159 back in October and is in 1.4.4.6 and up. If I revert this fix then I can reproduce the exact same problem you are seeing. So I can not explain why it is still failing for Endi, except that he is accidentally testing the wrong build or wrong version of lib389?

Unfortunately I found another bug in DS heathcheck where "list" objects, like Backends, are not processed correctly. That is unrelated to this ticket though, but this particular issue/error has been fixed in the latest version of lib389 (1.4.4.6 and up). If you are reproducing this on a system with 1.4.4.6, or higher, please provide me the details so I can log into the system and confirm what is really going on. Thanks!

edewata commented 3 years ago

PKI CI uses the latest DS packages from Fedora updates. Here's what was used today:

 389-ds-base                       x86_64  1.4.4.9-1.fc33                                     updates
 389-ds-base-libs                  x86_64  1.4.4.9-1.fc33                                     updates
 python3-lib389                    noarch  1.4.4.9-1.fc33                                     updates 

I'd suggest adding an IPA CI test to test ipa-healthcheck against the latest DS packages.

rcritten commented 3 years ago

On stock F33 I can reproduce this error (plus a couple about file ownership related to systemd-resolvd). I also have python3-lib389-1.4.4.9-1.fc33.noarch

mreynolds389 commented 3 years ago

This is now fixed upstream in 389-ds-base. Thanks @flo-renaud for providing me a system that was showing the problem!

flo-renaud commented 3 years ago

@rcritten the issue can be closed, not reproduced any more in the nightlies using 389-ds-base-1.4.3.21-1.fc32.x86_64 / 389-ds-base-1.4.4.13-2.fc33.x86_64