389ds / 389-ds-base

The enterprise-class Open Source LDAP server for Linux
https://www.port389.org/
Other
210 stars 90 forks source link

Improve memberof with a cache of group parents #2090

Closed 389-ds-bot closed 4 years ago

389-ds-bot commented 4 years ago

Cloned from Pagure issue: https://pagure.io/389-ds-base/issue/49031


When a group is updated, members of that group that are impacted by the update are fixup. The fixup recompute the memberof values of each impacted nodes.

The fixup is expensive because of internal searches. The vast majority of the internal searches are to lookup parents of groups (http://www.port389.org/docs/389ds/design/memberof-scalability.html#fixup).

This ticket is to implement a cache of the group's parent, so that once the parents of a given node has been identified, they are cached. So that next time the algorithm needs to retrieve the parents of that same node, it will pick them up from the cache.

389-ds-bot commented 4 years ago

Comment from tbordaz (@tbordaz) at 2016-11-21 21:02:55

attachment 0001-Ticket-49031-Improve-memberof-with-a-cache-of-group-.patch

389-ds-bot commented 4 years ago

Comment from tbordaz (@tbordaz) at 2016-11-21 21:04:57

The attached patch is part of memberof design http://www.port389.org/docs/389ds/design/memberof-scalability.html

More specifically it is described in http://www.port389.org/docs/389ds/design/memberof-scalability.html#caching-of-groups-ancestors

389-ds-bot commented 4 years ago

Comment from tbordaz (@tbordaz) at 2016-12-23 23:14:45

attachment 0001-Ticket-Ticket-49031-Improve-memberof-with-a-cache-of.patch

389-ds-bot commented 4 years ago

Comment from tbordaz (@tbordaz) at 2016-12-23 23:15:37

This second patch is a rebase on master. It also improve the patch in order to detect (and survive) to loops in the membership

389-ds-bot commented 4 years ago

Comment from tbordaz (@tbordaz) at 2017-01-03 15:33:01

attachment 0002-Ticket-49031-Improve-memberof-with-a-cache-of-ancest.patch

389-ds-bot commented 4 years ago

Comment from tbordaz (@tbordaz) at 2017-01-03 15:44:12

The patch is fix ending extra spaces and add function name in some slapi_log_err.

Throughput was tested (http://www.port389.org/docs/389ds/design/memberof-scalability.html#throughput-1)

With 1.3.6+48861+49031, duration of ADD of a single hbac group is 3min

389-ds-bot commented 4 years ago

Comment from firstyear (@Firstyear) at 2017-01-09 07:12:16

Hi,

I don't think we should add an extra configuration for this. It allows people to make the mistake of disabling it, involves the creation of a fixup for ipa upgrade, and generally other hassles. I don't see too much penalty in just enabling this by default, so we shouldn't need a configuration for this. As well, it means we have to test memberOf twice (once without, once with), so better to just test once and "get it right".

As well, your config if cache_leafs is not set defaults to "undefined", which isn't really awesome. Again, lets scrap the config, just turn it on.

You have the timing of the cache look ups defined by HAVE_CLOCK_GETTIME which is fine, but it probably should be both DEBUG and HAVE_CLOCK_GETTIME, because we don't really need to know the times in production, only during debug (unless you have a reason to ship timing info in this, then I remain open to convincing).

The hashtable size of 1000, will start to degrade seriously once you exceed 4 - 5x that in members, remember the size is the number of slots, not the max entries. So if you have a large group tree, this will degrade over 5000 groups. It's probably not a big issue though, just a "keep this in mind".

Without MEMBEROF_CACHE_DEBUG your code passes the memberOf test and asan (I have not run leak sanitise due to gcc6 issue).

When I ran with MEMBEROF_CACHE_DEBUG set to 1, I got this stackoverflow caught:

==14666== ERROR: AddressSanitizer: stack-buffer-overflow on address 0x7fafbba067e8 at pc 0x7fafe486a090 bp 0x7fafbba06540 sp 0x7fafbba06530
READ of size 8 at 0x7fafbba067e8 thread T34
==14666== AddressSanitizer CHECK failed: ../../../../libsanitizer/sanitizer_common/sanitizer_symbolizer_linux.cc:147 "((module_name_len)) != (((uptr)-1))" (0xffffffffffffffff, 0xffffffffffffffff)
    0 0x7fafe4be6dca in _ZdaPvRKSt9nothrow_t ??:?
    1 0x7fafe4bee033 in _ZN11__sanitizer11CheckFailedEPKciS1_yy ??:?
    2 0x7fafe4bf2c4f in _ZN11__sanitizer10StackTrace15UncompressStackEPS0_Pjm ??:?
    3 0x7fafe1733a5b in __dl_iterate_phdr :?
    4 0x7fafe4bf2f89 in _ZN11__sanitizer10StackTrace15UncompressStackEPS0_Pjm ??:?
    5 0x7fafe4bf237d in _ZN11__sanitizer10StackTrace15UncompressStackEPS0_Pjm ??:?
    6 0x7fafe4bf18c6 in __sanitizer_report_error_summary ??:?
    7 0x7fafe4bece06 in __asan_report_error ??:?
    8 0x7fafe4be7242 in __asan_report_load8 ??:?
    9 0x7fafe486a08f in slapi_value_get_string /home/william/development/389ds/ds/ldap/servers/slapd/value.c:384:0
    10 0x7fafd9234e1e in cache_ancestors /home/william/development/389ds/ds/ldap/servers/plugins/memberof/memberof.c:2240:0
    11 0x7fafd9235be6 in memberof_get_groups_r /home/william/development/389ds/ds/ldap/servers/plugins/memberof/memberof.c:2390:0
    12 0x7fafd9236245 in memberof_get_groups_callback /home/william/development/389ds/ds/ldap/servers/plugins/memberof/memberof.c:2486:0
    13 0x7fafe47d8fc5 in internal_srch_entry_callback /home/william/development/389ds/ds/ldap/servers/slapd/plugin_internal_op.c:102:0
    14 0x7fafe480332d in send_ldap_search_entry_ext /home/william/development/389ds/ds/ldap/servers/slapd/result.c:1519:0
    15 0x7fafe4801995 in send_ldap_search_entry /home/william/development/389ds/ds/ldap/servers/slapd/result.c:1056:0
    16 0x7fafe47a4c8a in iterate /home/william/development/389ds/ds/ldap/servers/slapd/opshared.c:1477:0
    17 0x7fafe47a57d3 in send_results_ext /home/william/development/389ds/ds/ldap/servers/slapd/opshared.c:1714:0
    18 0x7fafe47a3036 in op_shared_search /home/william/development/389ds/ds/ldap/servers/slapd/opshared.c:868:0
    19 0x7fafe47db577 in search_internal_callback_pb /home/william/development/389ds/ds/ldap/servers/slapd/plugin_internal_op.c:781:0
    20 0x7fafe47da529 in slapi_search_internal_callback_pb /home/william/development/389ds/ds/ldap/servers/slapd/plugin_internal_op.c:562:0
    21 0x7fafd9230419 in memberof_call_foreach_dn /home/william/development/389ds/ds/ldap/servers/plugins/memberof/memberof.c:914:0
    22 0x7fafd9235b61 in memberof_get_groups_r /home/william/development/389ds/ds/ldap/servers/plugins/memberof/memberof.c:2385:0
    23 0x7fafd9234916 in memberof_get_groups /home/william/development/389ds/ds/ldap/servers/plugins/memberof/memberof.c:2175:0
    24 0x7fafd92394a7 in memberof_fix_memberof_callback /home/william/development/389ds/ds/ldap/servers/plugins/memberof/memberof.c:3367:0
    25 0x7fafd923341b in memberof_modop_one_replace_r /home/william/development/389ds/ds/ldap/servers/plugins/memberof/memberof.c:1828:0
    26 0x7fafd92328a6 in memberof_modop_one_r /home/william/development/389ds/ds/ldap/servers/plugins/memberof/memberof.c:1572:0
    27 0x7fafd9232853 in memberof_modop_one /home/william/development/389ds/ds/ldap/servers/plugins/memberof/memberof.c:1559:0
    28 0x7fafd9233c35 in memberof_mod_smod_list /home/william/development/389ds/ds/ldap/servers/plugins/memberof/memberof.c:1936:0
    29 0x7fafd9233cff in memberof_add_smod_list /home/william/development/389ds/ds/ldap/servers/plugins/memberof/memberof.c:1959:0
    30 0x7fafd9231b7d in memberof_postop_modify /home/william/development/389ds/ds/ldap/servers/plugins/memberof/memberof.c:1274:0
    31 0x7fafe47cf344 in plugin_call_func /home/william/development/389ds/ds/ldap/servers/slapd/plugin.c:2071:0
    32 0x7fafe47cf03a in plugin_call_list /home/william/development/389ds/ds/ldap/servers/slapd/plugin.c:2013:0
    33 0x7fafe47c8897 in plugin_call_plugins /home/william/development/389ds/ds/ldap/servers/slapd/plugin.c:452:0
    34 0x7fafd9d741b1 in ldbm_back_modify /home/william/development/389ds/ds/ldap/servers/slapd/back-ldbm/ldbm_modify.c:835:0
    35 0x7fafe4792c3f in op_shared_modify /home/william/development/389ds/ds/ldap/servers/slapd/modify.c:1055:0
    36 0x7fafe478fe34 in do_modify /home/william/development/389ds/ds/ldap/servers/slapd/modify.c:388:0
    37 0x41e8e2 in connection_dispatch_operation /home/william/development/389ds/ds/ldap/servers/slapd/connection.c:633:0
    38 0x423b8c in connection_threadmain /home/william/development/389ds/ds/ldap/servers/slapd/connection.c:1772:0
    39 0x7fafe1e0b96a in PR_Select ??:?
    40 0x7fafe4beda97 in __asan_describe_address ??:?
    41 0x7fafe1bcedc4 in start_thread pthread_create.c:?

I hope this helps you,

389-ds-bot commented 4 years ago

Comment from tbordaz (@tbordaz) at 2017-01-09 22:00:41

attachment 0003-Ticket-49031-Improve-memberof-with-a-cache-of-ancest.patch

389-ds-bot commented 4 years ago

Comment from tbordaz (@tbordaz) at 2017-01-09 22:11:19

Hi,

Thank you sooo much for your continuous help and review on that plugin. I agree to drop the config option. Caching leafs does not show any benefit so proposing such option is a risk. We can imagine that administrator would like to cache the most in the hope to improve performance. But so far, except consuming memory it did not make a difference.

I attached a new patch (removing that option) and also making only DEBUG for HAVE_CLOCK_GETTIME. This sharp accounting of the cost of cache has nothing to do in production.

Finally good catch for the crash. To make it work with nested loop group, I had to change the interface of cache_ancestors using a slapi_value instead of a slapi_value *. I forgot to adapt some function calls using it. So it was crashing in debug mode but also in some error handling conditions.

389-ds-bot commented 4 years ago

Comment from firstyear (@Firstyear) at 2017-01-11 06:19:56

This looks much better. Really great work, and I enjoyed working with you on this. It now passes asan and memberof test suite.

Ack from me,

389-ds-bot commented 4 years ago

Comment from firstyear (@Firstyear) at 2017-01-11 06:25:04

You might consider cleaning up these warnings though. The ifdef warning repeats a number of times. I think it should be either a nested ifdef becaus ifdef doesn't use logical ops, or it may need to be #if defined A && B rather than #ifdef A && B.

  CC       ldap/servers/plugins/memberof/libmemberof_plugin_la-memberof.lo
/home/william/development/389ds/ds/ldap/servers/plugins/memberof/memberof.c: In function ‘memberof_call_foreach_dn’:
/home/william/development/389ds/ds/ldap/servers/plugins/memberof/memberof.c:815:3: warning: format ‘%x’ expects argument of type ‘unsigned int’, but argument 5 has type ‘struct memberof_cached_value *’ [-Wformat=]
   slapi_log_err(SLAPI_LOG_PLUGIN, MEMBEROF_PLUGIN_SUBSYSTEM, "memberof_call_foreach_dn: Ancestors of %s already cached (%x)\n", ndn, ht_grp);
   ^
/home/william/development/389ds/ds/ldap/servers/plugins/memberof/memberof.c: In function ‘cache_ancestors’:
/home/william/development/389ds/ds/ldap/servers/plugins/memberof/memberof.c:2304:2: warning: suggest parentheses around assignment used as truth value [-Wparentheses]
  if (double_check = ancestors_cache_lookup((const void*) key)) {
  ^
/home/william/development/389ds/ds/ldap/servers/plugins/memberof/memberof.c: In function ‘memberof_unlock’:
/home/william/development/389ds/ds/ldap/servers/plugins/memberof/memberof.c:2992:3: warning: format ‘%ld’ expects argument of type ‘long int’, but argument 4 has type ‘int’ [-Wformat=]
   slapi_log_err(SLAPI_LOG_FATAL, MEMBEROF_PLUGIN_SUBSYSTEM, "cache statistics duration: lookup %ld, add %ld, remove %ld, enum %ld\n",
   ^
/home/william/development/389ds/ds/ldap/servers/plugins/memberof/memberof.c:2992:3: warning: format ‘%ld’ expects argument of type ‘long int’, but argument 5 has type ‘int’ [-Wformat=]
/home/william/development/389ds/ds/ldap/servers/plugins/memberof/memberof.c:2992:3: warning: format ‘%ld’ expects argument of type ‘long int’, but argument 6 has type ‘int’ [-Wformat=]
/home/william/development/389ds/ds/ldap/servers/plugins/memberof/memberof.c:2992:3: warning: format ‘%ld’ expects argument of type ‘long int’, but argument 7 has type ‘int’ [-Wformat=]
/home/william/development/389ds/ds/ldap/servers/plugins/memberof/memberof.c: In function ‘ancestors_cache_lookup’:
/home/william/development/389ds/ds/ldap/servers/plugins/memberof/memberof.c:3242:14: warning: extra tokens at end of #ifdef directive [enabled by default]
 #ifdef DEBUG && HAVE_CLOCK_GETTIME
              ^
/home/william/development/389ds/ds/ldap/servers/plugins/memberof/memberof.c:3249:14: warning: extra tokens at end of #ifdef directive [enabled by default]
 #ifdef DEBUG && HAVE_CLOCK_GETTIME
              ^
/home/william/development/389ds/ds/ldap/servers/plugins/memberof/memberof.c:3259:14: warning: extra tokens at end of #ifdef directive [enabled by default]
 #ifdef DEBUG && HAVE_CLOCK_GETTIME
              ^
/home/william/development/389ds/ds/ldap/servers/plugins/memberof/memberof.c: In function ‘ancestors_cache_remove’:
/home/william/development/389ds/ds/ldap/servers/plugins/memberof/memberof.c:3276:14: warning: extra tokens at end of #ifdef directive [enabled by default]
 #ifdef DEBUG && HAVE_CLOCK_GETTIME
              ^
/home/william/development/389ds/ds/ldap/servers/plugins/memberof/memberof.c:3283:14: warning: extra tokens at end of #ifdef directive [enabled by default]
 #ifdef DEBUG && HAVE_CLOCK_GETTIME
              ^
/home/william/development/389ds/ds/ldap/servers/plugins/memberof/memberof.c:3293:14: warning: extra tokens at end of #ifdef directive [enabled by default]
 #ifdef DEBUG && HAVE_CLOCK_GETTIME
              ^
/home/william/development/389ds/ds/ldap/servers/plugins/memberof/memberof.c: In function ‘ancestors_cache_add’:
/home/william/development/389ds/ds/ldap/servers/plugins/memberof/memberof.c:3307:14: warning: extra tokens at end of #ifdef directive [enabled by default]
 #ifdef DEBUG && HAVE_CLOCK_GETTIME
              ^
/home/william/development/389ds/ds/ldap/servers/plugins/memberof/memberof.c:3313:14: warning: extra tokens at end of #ifdef directive [enabled by default]
 #ifdef DEBUG && HAVE_CLOCK_GETTIME
              ^
/home/william/development/389ds/ds/ldap/servers/plugins/memberof/memberof.c:3323:14: warning: extra tokens at end of #ifdef directive [enabled by default]
 #ifdef DEBUG && HAVE_CLOCK_GETTIME
              ^
/home/william/development/389ds/ds/ldap/servers/plugins/memberof/memberof.c: In function ‘ancestor_hashtable_empty’:
/home/william/development/389ds/ds/ldap/servers/plugins/memberof/memberof.c:3658:14: warning: extra tokens at end of #ifdef directive [enabled by default]
 #ifdef DEBUG && HAVE_CLOCK_GETTIME
              ^
/home/william/development/389ds/ds/ldap/servers/plugins/memberof/memberof.c:3665:14: warning: extra tokens at end of #ifdef directive [enabled by default]
 #ifdef DEBUG && HAVE_CLOCK_GETTIME
              ^
/home/william/development/389ds/ds/ldap/servers/plugins/memberof/memberof.c:3674:14: warning: extra tokens at end of #ifdef directive [enabled by default]
 #ifdef DEBUG && HAVE_CLOCK_GETTIME
              ^
389-ds-bot commented 4 years ago

Comment from tbordaz (@tbordaz) at 2017-01-11 21:24:37

Thanks William for the review. I fixed the compiler warnings ''' git push origin master''' Counting objects: 7, done. Delta compression using up to 8 threads. Compressing objects: 100% (7/7), done. Writing objects: 100% (7/7), 5.80 KiB | 0 bytes/s, done. Total 7 (delta 4), reused 0 (delta 0) To ssh://git.fedorahosted.org/git/389/ds.git ac44337..dfcef18 master -> master

commit dfcef185ff10eb817a6b4620bd9b5f0d20e8b053 Author: Thierry Bordaz tbordaz@redhat.com Date: Mon Jan 9 16:48:50 2017 +0100

389-ds-bot commented 4 years ago

Comment from tbordaz (@tbordaz) at 2017-02-11 23:05:33

Metadata Update from @tbordaz:

389-ds-bot commented 4 years ago

Comment from pj101 at 2017-05-19 10:28:16

Tests with 389ds 1.3.6.5. When deleting completely the uniqueMember attribute of a large group (~5000 entries) or when adding them back i have a lot of "memberof_fix_memberof_callback: Weird" error messages (2 from emptying the group and 47 when adding the uniqueMember attributes for everyone back) from this patch:

[19/May/2017:10:17:40.600682480 +0200] - ERR - memberof-plugin - memberof_fix_memberof_callback: Weird, uid=user_login,ou=personnel,ou=utilisateurs,dc=id,dc=polytechnique,dc=edu is not in the cache

The final state of the entries is consistent. Not sure if this error message indicates any serious problem.

389-ds-bot commented 4 years ago

Comment from tbordaz (@tbordaz) at 2017-05-19 11:41:40

Hi @pj101,

Thank you so much for your testing and feedback !!

The ancestors cache is to cache groups ancestors. Now when adding the ancestors for a given entry, it was quite difficult to know if the entry itself was or not a group. So ancestors a cached systematically and later in the algo removed if the entry was a leaf entry. This message means that the entry was a leaf entry and while trying to remove its ancestors from the cache, the entry was not found in the cache.

It does not create specific issue but looks weird at this point of code. The only reason I can imagine is if the cache was disabled in the middle of the recursive tree look through. This can happen if circle was detected in the memberships.

Do you know if your groups can contain circles ? Did you identify a specific case it happens so that I can try to reproduce.

IMHO you should open a new ticket because either it is normal message but it should not be displayed if circles are detected, or it is not normal and needs to be understood

389-ds-bot commented 4 years ago

Comment from tbordaz (@tbordaz) at 2017-05-19 11:41:44

Metadata Update from @tbordaz:

389-ds-bot commented 4 years ago

Comment from pj101 at 2017-05-19 14:23:42

@tbordaz Ok i've created a new ticket, hopefully with steps to reproduce: #2324

389-ds-bot commented 4 years ago

Comment from sramling at 2017-06-13 16:03:17

Metadata Update from @sramling:

389-ds-bot commented 4 years ago

Comment from sramling at 2017-06-13 16:06:26

Automated test cases to test memberOf plugin performance with Online import, Bulk add, modify and delete of nested groups.

0001-Add-performance-tests-for-memberof-plugin.patch

389-ds-bot commented 4 years ago

Comment from sramling at 2017-06-14 13:42:21

The numbers were too big and the tests didn't complete even after 24hrs. So, I reduced the numbers to "200000" users max for the tests. Will increase the numbers, once I receive feedback/input from the Dev team.

0001-Add-performance-tests-for-memberof-plugin.patch

389-ds-bot commented 4 years ago

Comment from tbordaz (@tbordaz) at 2017-06-14 13:51:47

@sramling what is the layout of the 24h run ? is it spending time to import users, fixup or verification ? which is the proportion of each of them ?

Regarding the testcase, I think it would be interesting to check a hard limit of the overall duration. For example 3h, so that if in further release it reaches that hard limit the test will failing with a performance regression.

389-ds-bot commented 4 years ago

Comment from sramling at 2017-06-14 14:02:40

@tbordaz it takes about 20 hrs(still not completed) to run import + fixup memberOf task for data set [500000-1000-100-20-10] . Its consuming more time for finishing fixup memberOf task, than import task.

Regrading the test duration, yes, it makes sense to set the hard limit to 3hrs and assess performance from each build or commit.

389-ds-bot commented 4 years ago

Comment from spichugi (@droideck) at 2017-06-14 18:36:02

Hi Sankar, to improve the readability and to make it easier to debug, we can do this:

In the function:

def _nested_import_add_ldif(topo, nof_users, nof_groups, grps_user, ngrps_user, nof_depth, is_import):
    """Create LDIF files for given nof users, groups and nested group levels"""

Instead of this:

log.info('Create LDIF files for given nof users, groups and nested group levels')
replace_values = {'NOF_USERS': nof_users, 'NOF_GROUPS': nof_groups, 'GRP_USERS': grps_user,
                  'NGRP_USERS': ngrps_user, 'NOF_DEPTH': nof_depth, 'SUFFIX': SUFFIX, 'DOMAIN': DOMAIN}
newscript = '/tmp/create_data.py'
data_temp = '/tmp/temp_data.ldif'
dir_path = os.path.dirname(os.path.realpath(__file__))
actscript = os.path.join(dir_path, 'create_data.py')
log.info('Create base entry for import')
data_ldif = _create_base_ldif(topo, 'no')

log.info('Replace users, groups, nested users, groups and nested levels')
with open(actscript) as script_in, open(newscript, 'w') as script_out:
    for line in script_in:
        for src, target in replace_values.iteritems():
            line = line.replace(src, str(target))
        script_out.write(line)
cmd = 'python {} > {}'.format(newscript, data_temp)
log.info('Creating LDIF file for importing bulk entries')
os.system(cmd)

fh2 = open(data_temp, 'r')
fh1 = open(data_ldif, 'a')
while True:
    data2 = fh2.read(100000)
    if data2:
        fh1.write(data2)
    else:
        break
fh2.close()
fh1.close()

It would be better to use pure. But you need to modify RHDSDataLDIF, so it will not print the lines, but put it to the file you specify:

data = RHDSDataLDIF() # put your number of users here as parameters
data.do_magic()

Also, just to remind. Instead of this:

fh2 = open(data_temp, 'r')
fh1 = open(data_ldif, 'a')
while True:
    data2 = fh2.read(100000)
    if data2:
        fh1.write(data2) # If some exception will happen in your code, the files won't be closed
    else:
        break
fh2.close()
fh1.close()

Use 'with open():' structure.

Use boolean instead of this:

if is_import == 'import':
if import_base == 'yes':

because you really have only two options.

Instead of PORT_STANDALONE and other such variables, use the 'instance.port' value. With this, you will be sure that you've got right one.

Do not use 'os.system'. Use 'subprocess' library instead. It is more safe choice.

Do not use 'assert False' in the 'except:' block if you are not printing the exception message. The one who debug needs to know what went wrong. Use 'raise'.

Also, I've run on a clean machine and it was this error:

dirsrvtests/tests/perf/memberof_test.py::test_nestgrps_import[200000-1000-100-20-10] 
INFO:lib389.utils:Create nested ldif file with users-200000, groups-1000, nested-10
INFO:lib389.utils:Import LDIF file and measure the time taken
INFO:lib389.utils:Create LDIF files for given nof users, groups and nested group levels
INFO:lib389.utils:Create base entry for import
INFO:lib389.utils:Add base entry for online import
INFO:lib389.utils:Return LDIF file
INFO:lib389.utils:Replace users, groups, nested users, groups and nested levels
INFO:lib389.utils:Creating LDIF file for importing bulk entries
INFO:lib389.utils:Run importLDIF task to add entries to Server
INFO:lib389:Import task import_06142017_080359 for file /perf_test.ldif completed successfully
INFO:lib389.utils:Check if number of entries created matches the expected entries
INFO:lib389.utils:Expected entries: 201000, Actual entries: 100500
FAILED
389-ds-bot commented 4 years ago

Comment from firstyear (@Firstyear) at 2017-06-19 02:45:15

Just want to add, that these performance tests aren't actually asserting anythin about the performance. they are just a load test. It would be good to have these configuring in a way to "skip" them during CI, because they otherwise waste time without a human observer.

389-ds-bot commented 4 years ago

Comment from tbordaz (@tbordaz) at 2017-06-19 09:55:09

@Firstyear if we have a test that can detect a performance regression, it is not a waste of time. Such long tests should not pollute our day to day CI tests and I agree it would be great to be able to skip them. Does it exist families of CI tests: normal tests, long duration, performance ?

389-ds-bot commented 4 years ago

Comment from mreynolds (@mreynolds389) at 2017-06-19 16:59:18

@tbordaz yes we have a /stress directory under under /dirsrvtests/tests/. We can add other directories as needed.

389-ds-bot commented 4 years ago

Comment from firstyear (@Firstyear) at 2017-06-20 03:04:53

@tbordaz The test literally does not check the timing of the feature at all or make any assertions. It just stresses the server a bunch and then says "done". There is no threshold to beat or anything.

AS a result, to run this test properly you need a human and two instances of DS, one before and one after.

That's why I think putting this in the stress directory like mark said is a good idea, because I don't want to run this test everytime I run the CI on my laptop,

389-ds-bot commented 4 years ago

Comment from vashirov (@vashirov) at 2017-06-21 12:51:12

@Firstyear Think of this test as a benchmark. It doesn't say to you if it's a good result or a bad one. Moreover, test is already in a separate directory:

 17  create mode 100644 dirsrvtests/tests/perf/memberof_test.py

which you can skip as well as the other stress tests.

Also, these kind of tests are not intended to be executed as a single shot test. They will be running from CI which has a history of test executions with their timings, so it will give you a nice retrospective view of the performance trend across builds/commits.

I hope this clears things a bit.

389-ds-bot commented 4 years ago

Comment from sramling at 2017-07-06 15:33:56

Incorporated all the review comments from Simon. 0001-Add-performance-tests-for-memberof-plugin.patch

389-ds-bot commented 4 years ago

Comment from sramling at 2017-07-06 15:35:37

@droideck , request you to review the patch, thanks!

389-ds-bot commented 4 years ago

Comment from sramling at 2017-07-06 15:36:20

@tbordaz , request you to review the patch, thanks!

389-ds-bot commented 4 years ago

Comment from sramling at 2017-07-06 19:34:34

0001-Add-performance-tests-for-memberof-plugin.patch

389-ds-bot commented 4 years ago

Comment from firstyear (@Firstyear) at 2017-07-07 02:29:35

Looks okay to me :)

389-ds-bot commented 4 years ago

Comment from tbordaz (@tbordaz) at 2017-07-07 14:28:49

The test cases look really great. Few comments

389-ds-bot commented 4 years ago

Comment from sramling at 2017-07-07 15:27:19

@tbordaz create_data.py is very similar to the one in github, I agree. But, I customized the script to be used without the need to install/copy ipa server packages/schema.

Fin calls plugin enable, not disable, Wow!! wondering how you noticed that :-) great!!. Will fix it now.

I am keeping the sync_memberof_attrs function for ldap_add as well, since I want to validate the total memberOf attributes to assert the test. As you pointed out, it doesn't make a lot of sense, but keeping it will avoid any further failures.

389-ds-bot commented 4 years ago

Comment from sramling at 2017-07-07 16:42:49

Adding a new patch with fix for fin to disable plugins and make sure the tests complete in 3 to 4 hrs.

0001-Add-performance-tests-for-memberof-plugin.patch

389-ds-bot commented 4 years ago

Comment from tbordaz (@tbordaz) at 2017-07-11 11:01:06

Thanks for your feedback. The patch looks good to me, indeed it is a nice job !. you have my ack.

389-ds-bot commented 4 years ago

Comment from spichugi (@droideck) at 2017-07-11 15:39:42

Thanks for the patch. It seems working. Only a few points that very important and we can make it better.

We need to optimise the imports. We are changing them in other tests slowly and it's better not make mistakes in new test cases.

It will be better to specify the exact imports you need (mostly, it is just couple functions).

from create_data import *
from lib389.tasks import *
from lib389.utils import *
from lib389._constants import *  # we already fixed these imports in other tests
from lib389.dseldif import DSEldif # make it like this, without *

In the memberof_setup you don't raise any exception if there is any error and you just log the error. Is it enough in your case? If plugins will fail, does the rest of test case make sense? Anyway, ldap.LDAPError is too general. If some exception is expected, you need define it explicitly.

359 +    try:
360 +        topo.standalone.plugins.enable(name=PLUGIN_MANAGED_ENTRY)
361 +        topo.standalone.plugins.enable(name=PLUGIN_AUTOMEMBER)
362 +    except ldap.LDAPError as e:
363 +        log.error('Failed to enable {}, {} plugins'.format(PLUGIN_MANAGED_ENTRY, PLUGIN_AUTOMEMBER))

You don't need to fd.close(), because 'with open()' will close it for you.

423 +    with open(ldif_file, "w") as fd:
424 +        fd.write(base_ldif)
425 +        fd.close()

Please, use subprocess instead of 'os.popen'. If you have any questions about how to use it, ping me.

389-ds-bot commented 4 years ago

Comment from sramling at 2017-07-11 15:45:22

Thanks for the patch. It seems working. Only a few points that very important and we can make it better. We need to optimise the imports. We are changing them in other tests slowly and it's better not make mistakes in new test cases. I totally agree. I didn't have this import statement from the start, but it failed in the last patch with 1minutetip tests. Hence, I added it. I am not sure what went wrong :-( It will be better to specify the exact imports you need (mostly, it is just couple functions). from create_data import from lib389.tasks import from lib389.utils import from lib389._constants import # we already fixed these imports in other tests from lib389.dseldif import DSEldif # make it like this, without * Will fix these imports.

In the memberof_setup you don't raise any exception if there is any error and you just log the error. Is it enough in your case? If plugins will fail, does the rest of test case make sense? Anyway, ldap.LDAPError is too general. If some exception is expected, you need define it explicitly. Sure, I will change it 359 + try: 360 + topo.standalone.plugins.enable(name=PLUGIN_MANAGED_ENTRY) 361 + topo.standalone.plugins.enable(name=PLUGIN_AUTOMEMBER) 362 + except ldap.LDAPError as e: 363 + log.error('Failed to enable {}, {} plugins'.format(PLUGIN_MANAGED_ENTRY, PLUGIN_AUTOMEMBER))

You don't need to fd.close(), because 'with open()' will close it for you. 423 + with open(ldif_file, "w") as fd: 424 + fd.write(base_ldif) 425 + fd.close() Sure, will remove it.

Please, use subprocess instead of 'os.popen'. If you have any questions about how to use it, ping me.

389-ds-bot commented 4 years ago

Comment from sramling at 2017-07-14 05:34:33

@droideck , adding a new patch with most of the comments resolved. Thanks for your valuable feedback.

0001-Add-performance-tests-for-memberof-plugin.patch

389-ds-bot commented 4 years ago

Comment from spichugi (@droideck) at 2017-07-14 08:21:04

Thank you for the fixes.

Though, you haven't incorporated the part with constants. Now the test suite fails before the start (on updated lib389). Please, put constants (and it would be nice for utils/tasks too) like this:

from lib389._constants import SUFFIX, DN_SCHEMA, DN_DM, DEFAULT_SUFFIX, PASSWORD

Also, 'NameError: global name 'subprocess' is not defined'.

Why have you changed the line with parametrization numers in all tests (for instance test_nestgrps_import)?

@pytest.mark.parametrize("nof_users, nof_groups, grps_user, ngrps_user, nof_depth",
    [(20000, 100, 20, 10, 5), (50000, 100, 20, 10, 10), (100000, 200, 20, 10, 10)])

to

@pytest.mark.parametrize("nof_users, nof_groups, grps_user, ngrps_user, nof_depth",
    [(20000, 200, 20, 10, 5), (50000, 500, 50, 10, 10), (100000, 1000, 100, 20, 20)])

The numbers were already reviewed by developers, as I understand.

389-ds-bot commented 4 years ago

Comment from sramling at 2017-07-14 14:18:02

The numbers for parameterizing weren't fixed yet. The only requirement I heard from @tbordaz is... the execution should complete in 3 to 4 hrs, not beyond. So that, we can compare the performance when there is a new build from 389-ds-base. Uploading a new patch.

0001-Add-performance-tests-for-memberof-plugin.patch

389-ds-bot commented 4 years ago

Comment from spichugi (@droideck) at 2017-07-14 17:36:14

Metadata Update from @droideck:

389-ds-bot commented 4 years ago

Comment from spichugi (@droideck) at 2017-07-14 17:39:06

Thank you! Good patch!

680a1a5..b882685 master -> master commit b8826854c583a091b372bf75f02efc01de411f1e Author: Sankar Ramalingam sramling@redhat.com Date: Fri Jul 14 17:43:38 2017 +0530

389-ds-bot commented 4 years ago

Comment from tbordaz (@tbordaz) at 2017-07-17 09:58:06

@sramling throughput of updates is difficult to tune in automatic tests because it will fluctuate with the VM that is granted and the host that runs it. This is the reason why I proposed a hard limit for the response time. This is an ongoing discussion we had with @elkris and @Firstyear how to monitor performance increase/decrease between versions. We had no good idea how to do it, so if you have some tip it would be more than welcome.