Junction Management performances ideas

sygilber commented 3 years ago

Just sharing here some thoughts about junction code performances.

a) Testing if a junction exists

Using curl against a wrp with only a single junction defined, testing if the junction exists using this URI code takes systematically about 700 ms:

curl -k --user admin@local:**** -H "Content-Type:application/json" -H "Accept: application/json" -X GET https://someappliance/wga/reverseproxy/default/junctions

And using curl against the same wrp, testing if the junction exists using this URI code takes systematically about 500 ms (irrespective if it exists or not):

curl -k --user admin@local:**** -H "Content-Type:application/json" -H "Accept: application/json" -X GET https://someappliance/wga/reverseproxy/default/junctions?junctions_id=/isam

200 ms of difference can be quite a lot when trying to detect presence of 50 junctions defined on a particular WRP, and even more if it is clustered.

So I am planning some refactoring/performance testing of the method _check() (inside junctions.py) to use the second form (junctions?junctions_id) of the restapi query than the first. My only concern was if testing for a non-existing junction using the second restapi format when the junction does not exist would fill the wrp server logs with errors messages (such as DPWWM1346E) but it does not.

sygilber commented 3 years ago

Some results are in:

With 100 iteration with current unmodified code:

ibm.isam.set_junctions : Set an array of Junctions -------------------------------------------------------------------------- 222.55s

And with over 100 iteration with modified code (_check()):

ibm.isam.set_junctions : Set an array of Junctions -------------------------------------------------------------------------- 200.88s

So that is already a 10% improvment right there.

ps: I did forced the code to think the junction had to be created on each iteration.

But I will need to tweek the code to handle the case where the junction does not exist and a 500 rc is returned along with this {"message":"DPWWM1346E Cannot show junction\nDPWWA1201E Junction not found"} which raises an error. We would want that to be muted in this particular case.

Warkdev commented 3 years ago

Hey, nice you're attacking this.

One way to reduce the impact of deploying several junctions at once is to lower the amount of API requests. I've found that the biggest gain is doing the following :

retrieve list of configured data (with details if possible which is not the case for junctions)
force add new junctions (the ones not in the retrieved list)
force delete old junctions
update junctions (check idempotency first)

All in all, this is reducing the amount of API calls by, at worst, 49 calls for a batch of 50 existing junctions and much more for a batch of 50 new junctions.

sygilber commented 3 years ago

Your right, There is no way right now to obtain in a single API call the full and detailled list of all junctions; only one of the other. Currently, the full list is often re-obtained over and over.

But should I understand that you have a private python code supporting a the simulated batch update method your described ? Anyhow, that could be the next step, i.e. to provide new function in junction.py to do otherwise (batch mode), thus caching and computing at once all required changes.

But for now the approach I am seeking is to make little iterative changes.

sygilber commented 3 years ago

Just sharing my experimental code here for now from junctions.py's _check() method yielding 10% improvement:

def _check(isamAppliance, reverseproxy_id, junctionname):
    ret_obj = isamAppliance.invoke_get("Retrieving the parameters for a single standard or virtual junction",
                                       "{0}/{1}/junctions?junctions_id={2}".format(uri, reverseproxy_id,
                                                                                   junctionname),
                                       requires_modules=requires_modules,
                                       requires_version=requires_version,
                                       ignore_error=True)
    # Assume that if 'message' is present, an error occured thus the junction does not exist.
    if 'message' in ret_obj['data']:
        return False

    return True

Away from the office for a while and this needs more testing before I submit it via a PR.

sygilber commented 3 years ago

Already spoted another possible optimization refered to b):

        if _check(isamAppliance, reverseproxy_id, junction_point) is True:
            logger.debug("Junction exists. Compare junction details.")
            ret_obj = get(isamAppliance, reverseproxy_id=reverseproxy_id, junctionname=junction_point)

In the set() method, if the _check() method returned the junction data (and not just the boolean, the call to get() above would thus not be necessary, This is going in the direction you mentioned which is to reduce API call.

Warkdev commented 3 years ago

Your right, There is no way right now to obtain in a single API call the full and detailled list of all junctions; only one of the other. Currently, the full list is often re-obtained over and over.

But should I understand that you have a private python code supporting a the simulated batch update method your described ? Anyhow, that could be the next step, i.e. to provide new function in junction.py to do otherwise (batch mode), thus caching and computing at once all required changes.

But for now the approach I am seeking is to make little iterative changes.

Yes, this is what we have (but tailored for our deployment approach which is slightly different than the one proposed here). We managed, for the same deployment of a wrp with 200 junctions (incl objectspace, policies,...) to reduce the time from 3h30 to 10 minutes using this technique.

Warkdev commented 3 years ago

Just sharing my experimental code here for now from junctions.py's _check() method yielding 10% improvement:

def _check(isamAppliance, reverseproxy_id, junctionname):
    ret_obj = isamAppliance.invoke_get("Retrieving the parameters for a single standard or virtual junction",
                                       "{0}/{1}/junctions?junctions_id={2}".format(uri, reverseproxy_id,
                                                                                   junctionname),
                                       requires_modules=requires_modules,
                                       requires_version=requires_version,
                                       ignore_error=True)
    # Assume that if 'message' is present, an error occured thus the junction does not exist.
    if 'message' in ret_obj['data']:
        return False

    return True

Away from the office for a while and this needs more testing before I submit it via a PR.

Yes, you'd need to try and catch the error though. I think in case of a http 500, the isam appliance class will throw an exception.

sygilber commented 3 years ago

Some more results:

In cases of 100% idempotency, with original _check() (100 junctions):

ibm.isam.set_junctions : Set an array of Junctions -------------------------------------------------------------------------- 160.26s

In cases of 100% idempotency, with revised _check() (100 junctions):

ibm.isam.set_junctions : Set an array of Junctions --------------------------------------------------------------------------- 80.08s

So indeed, a reduction of api call makes a huge difference.

Will continue testing new code for a while and will consider a PR once we are confident it is rock solid in a few weeks.

Warkdev commented 3 years ago

Each api calls is 500ms. Once you know this, you know how to save time.

Also, think about a way to provide the feedback of what changed or not to ansible. 😉

vcassidy commented 3 years ago

I tackled this problem a while ago for a gui I was writing rather than for Ansible or python api. It is a little ugly but the most effective way to get the details behind all junctions for a WebSEAL instance was to revert to pdadmin commands and parse the response.

Get the list of junctions https://{appliance_hostname}/wga/reverseproxy/{reverseproxy_id}/junctions
Parse the response and create a pdadmin s t webseal-name show /junctionname for each junction
Send all the junction show commands for a WebSEAL instance in a single POST to pdadmin interface https://{appliance_hostname}/isam/pdadmin/
parse the response

the parsing is ugly but the response format from commandline hasn't changed in many releases. Sending 1 requests to pdadmin for all the junctions will cut the total time taken to a fraction of sending a request per junction.

Hope this is useful

Warkdev commented 3 years ago

This is definitely the most effective way today.

Warkdev commented 3 years ago

I tackled this problem a while ago for a gui I was writing rather than for Ansible or python api. It is a little ugly but the most effective way to get the details behind all junctions for a WebSEAL instance was to revert to pdadmin commands and parse the response.

Get the list of junctions https://{appliance_hostname}/wga/reverseproxy/{reverseproxy_id}/junctions

Parse the response and create a pdadmin s t webseal-name show /junctionname for each junction

Send all the junction show commands for a WebSEAL instance in a single POST to pdadmin interface https://{appliance_hostname}/isam/pdadmin/

parse the response

the parsing is ugly but the response format from commandline hasn't changed in many releases. Sending 1 requests to pdadmin for all the junctions will cut the total time taken to a fraction of sending a request per junction.

Hope this is useful

Hello, would you mind sharing that part of your code? I did write something very similar. And that's actually way faster and consistant than the api. Which is a pity.

Ps: isva 10.0.2.0 added 2 news server settings.

tombosmansibm commented 2 years ago

Even better would be if there was a REST endpoint to get all junctions ... https://ibmsecurity.ideas.ibm.com/ideas/ISAM-I-1061

IBM-Security / ibmsecurity

Junction Management performances ideas #320