389ds / 389-ds-base

The enterprise-class Open Source LDAP server for Linux
https://www.port389.org/
Other
210 stars 90 forks source link

Test memberof_plugin/fixup_test.py::test_fixup_task_limit fails intermittently #6214

Open vashirov opened 3 months ago

vashirov commented 3 months ago

Issue Description

=================================== FAILURES ===================================
____________________________ test_fixup_task_limit _____________________________

topo = <lib389.topologies.TopologyMain object at 0x7f69f367caa0>

    def test_fixup_task_limit(topo):
        """Test only one fixup task is allowed at one time

        :id: 2bb49a10-fca9-4d89-9a7a-34c2ba4baadc
        :setup: Standalone Instance
        :steps:
            1. Add some users and groups
            2. Enable memberOf Plugin
            3. Add fixup task
            4. Add second task
            5. Add a third task after first task completes
        :expectedresults:
            1. Success
            2. Success
            3. Success
            4. Second task should fail
            5. Success
        """

        # Create group with members
        groups = Groups(topo.standalone, DEFAULT_SUFFIX)
        group = groups.create(properties={'cn': 'test'})

        users = UserAccounts(topo.standalone, DEFAULT_SUFFIX)
        for idx in range(400):
            user = users.create(properties={
                'uid': 'testuser%s' % idx,
                'cn' : 'testuser%s' % idx,
                'sn' : 'user%s' % idx,
                'uidNumber' : '%s' % (1000 + idx),
                'gidNumber' : '%s' % (1000 + idx),
                'homeDirectory' : '/home/testuser%s' % idx
            })
            group.add('member', user.dn)

        # Configure memberOf plugin
        memberof = MemberOfPlugin(topo.standalone)
        memberof.enable()
        topo.standalone.restart()

        # Add first task
        task = memberof.fixup(DEFAULT_SUFFIX)

        # Add second task which should fail
>       with pytest.raises(ldap.UNWILLING_TO_PERFORM):
E       Failed: DID NOT RAISE <class 'ldap.UNWILLING_TO_PERFORM'>

dirsrvtests/tests/suites/memberof_plugin/fixup_test.py:67: Failed
vashirov commented 3 months ago

Failed with BDB too https://github.com/389ds/389-ds-base/actions/runs/9445172278/job/26013342764#step:7:608

vashirov commented 3 months ago

It's a race condition, the first fixup task finishes before the second starts, so no exception is raised:

[10/Jun/2024:09:09:25.834259712 +0000] conn=1 op=2 ADD dn="cn=memberOf_fixup_2024-06-10T09:09:25.833971,cn=memberOf task,cn=tasks,cn=config"
[10/Jun/2024:09:09:25.837952578 +0000] conn=1 op=2 RESULT err=0 tag=105 nentries=0 wtime=0.000118196 optime=0.003697134 etime=0.003813366
[10/Jun/2024:09:09:26.084716617 +0000] conn=1 op=3 ADD dn="cn=memberOf_fixup_2024-06-10T09:09:25.838057,cn=memberOf task,cn=tasks,cn=config"
[10/Jun/2024:09:09:26.089105261 +0000] conn=1 op=3 RESULT err=0 tag=105 nentries=0 wtime=0.000217772 optime=0.004398722 etime=0.004614611
[10/Jun/2024:09:09:25.838214450 +0000] - INFO - memberof-plugin - memberof_fixup_task_thread - Memberof task starts (filter: "(|(objectclass=inetuser)(objectclass=inetadmin)(objectclass=nsmemberof))") ...
[10/Jun/2024:09:09:26.045646592 +0000] - INFO - memberof-plugin - memberof_fixup_task_thread - Memberof task finished (processed 408 entries in 0 seconds)
[10/Jun/2024:09:09:26.089343225 +0000] - INFO - memberof-plugin - memberof_fixup_task_thread - Memberof task starts (filter: "(|(objectclass=inetuser)(objectclass=inetadmin)(objectclass=nsmemberof))") ...

I think we can increase the number of entries for the fixup task to process, otherwise it's too fast, especially on LMDB :)

tbordaz commented 3 months ago

I agree with your proposal. Do you how much entries are needed to get a fixup running for 5sec ?

vashirov commented 3 months ago

For 4000 entries I got these results:

[10/Jun/2024:06:13:01.826358721 -0400] conn=1 op=2 ADD dn="cn=memberOf_fixup_2024-06-10T06:13:01.826070,cn=memberOf task,cn=tasks,cn=config"
[10/Jun/2024:06:13:01.831296015 -0400] conn=1 op=2 RESULT err=0 tag=105 nentries=0 wtime=0.000146513 optime=0.004941602 etime=0.005087147
[10/Jun/2024:06:13:01.831699508 -0400] conn=1 op=3 ADD dn="cn=memberOf_fixup_2024-06-10T06:13:01.831436,cn=memberOf task,cn=tasks,cn=config"
[10/Jun/2024:06:13:01.839102539 -0400] conn=1 op=3 RESULT err=53 tag=105 nentries=0 wtime=0.000098994 optime=0.007406013 etime=0.007503500
[10/Jun/2024:06:13:01.831020151 -0400] - INFO - memberof-plugin - memberof_fixup_task_thread - Memberof task starts (filter: "(|(objectclass=inetuser)(objectclass=inetadmin)(objectclass=nsmemberof))") ...
[10/Jun/2024:06:13:01.837270270 -0400] - ERR - memberof-plugin - memberof_task_add - there is already an identical fixup task running: base: dc=example,dc=com filter: (|(objectclass=inetuser)(objectclass=inetadmin)(objectclass=nsmemberof))
[10/Jun/2024:06:13:01.846494452 -0400] - INFO - memberof-plugin - memberof_fixup_task_thread - Memberof task starts (filter: "(|(objectclass=inetuser)(objectclass=inetadmin)(objectclass=nsmemberof))") ...
[10/Jun/2024:06:13:02.588897161 -0400] - INFO - memberof-plugin - memberof_fixup_task_thread - Memberof task finished (processed 3996 entries in 1 seconds)
[10/Jun/2024:06:13:03.397260456 -0400] - INFO - memberof-plugin - memberof_fixup_task_thread - Memberof task finished (processed 7993 entries in 2 seconds)
[10/Jun/2024:06:13:03.977796100 -0400] - INFO - memberof-plugin - memberof_fixup_task_thread - Memberof task starts (filter: "(|(objectclass=inetuser)(objectclass=inetadmin)(objectclass=nsmemberof))") ...

So for 5 seconds it would be around 20000+ entries. The downside is that in increases the test time significantly because of adding and modifying more entries.

vashirov commented 3 months ago

With 4000 entries it takes 140s to complete the test, with 20000 - 1605s, mostly because of the unbuffered access logs in the test env, logging each add/mod. And it fails in the end, because the third task is not allowed, since the first one is still running.

tbordaz commented 3 months ago

Another option would be to generate (ldifgen) a N groups and M users with all users member of all groups and import the ldif (memberof disabled), then run the memberof fixup. Note the LMDB import may be a bit slow.

vashirov commented 3 months ago

While checking the logs, I found some discrepancies, that I think might a bug in how fixup task calculates the stats:

Test with 6000 entries:

[10/Jun/2024:07:30:17.619788391 -0400] - INFO - memberof-plugin - memberof_fixup_task_thread - Memberof task finished (processed 6001 entries in 40 seconds)
[10/Jun/2024:07:30:17.791544224 -0400] - INFO - memberof-plugin - memberof_fixup_task_thread - Memberof task finished (processed 4 entries in 0 seconds)
[10/Jun/2024:07:30:17.794205057 -0400] - INFO - memberof-plugin - memberof_fixup_task_thread - Memberof task finished (processed 4 entries in 0 seconds)

Test with 5000 entries:

[10/Jun/2024:07:33:49.171642325 -0400] - INFO - memberof-plugin - memberof_fixup_task_thread - Memberof task finished (processed 5005 entries in 1 seconds)
[10/Jun/2024:07:33:50.020074840 -0400] - INFO - memberof-plugin - memberof_fixup_task_thread - Memberof task finished (processed 10001 entries in 2 seconds)
[10/Jun/2024:07:33:50.334245942 -0400] - INFO - memberof-plugin - memberof_fixup_task_thread - Memberof task finished (processed 123 entries in 0 seconds)

In the last snippet time spent is very low, but then number of entries in the next line is almost double.