Open sygilber opened 3 years ago
Some results are in:
With 100 iteration with current unmodified code:
ibm.isam.set_junctions : Set an array of Junctions -------------------------------------------------------------------------- 222.55s
And with over 100 iteration with modified code (_check()):
ibm.isam.set_junctions : Set an array of Junctions -------------------------------------------------------------------------- 200.88s
So that is already a 10% improvment right there.
ps: I did forced the code to think the junction had to be created on each iteration.
But I will need to tweek the code to handle the case where the junction does not exist and a 500 rc is returned along with this {"message":"DPWWM1346E Cannot show junction\nDPWWA1201E Junction not found"} which raises an error. We would want that to be muted in this particular case.
Hey, nice you're attacking this.
One way to reduce the impact of deploying several junctions at once is to lower the amount of API requests. I've found that the biggest gain is doing the following :
All in all, this is reducing the amount of API calls by, at worst, 49 calls for a batch of 50 existing junctions and much more for a batch of 50 new junctions.
Your right, There is no way right now to obtain in a single API call the full and detailled list of all junctions; only one of the other. Currently, the full list is often re-obtained over and over.
But should I understand that you have a private python code supporting a the simulated batch update method your described ? Anyhow, that could be the next step, i.e. to provide new function in junction.py to do otherwise (batch mode), thus caching and computing at once all required changes.
But for now the approach I am seeking is to make little iterative changes.
Just sharing my experimental code here for now from junctions.py's _check() method yielding 10% improvement:
def _check(isamAppliance, reverseproxy_id, junctionname):
ret_obj = isamAppliance.invoke_get("Retrieving the parameters for a single standard or virtual junction",
"{0}/{1}/junctions?junctions_id={2}".format(uri, reverseproxy_id,
junctionname),
requires_modules=requires_modules,
requires_version=requires_version,
ignore_error=True)
# Assume that if 'message' is present, an error occured thus the junction does not exist.
if 'message' in ret_obj['data']:
return False
return True
Away from the office for a while and this needs more testing before I submit it via a PR.
Already spoted another possible optimization refered to b):
if _check(isamAppliance, reverseproxy_id, junction_point) is True:
logger.debug("Junction exists. Compare junction details.")
ret_obj = get(isamAppliance, reverseproxy_id=reverseproxy_id, junctionname=junction_point)
In the set() method, if the _check() method returned the junction data (and not just the boolean, the call to get() above would thus not be necessary, This is going in the direction you mentioned which is to reduce API call.
Your right, There is no way right now to obtain in a single API call the full and detailled list of all junctions; only one of the other. Currently, the full list is often re-obtained over and over.
But should I understand that you have a private python code supporting a the simulated batch update method your described ? Anyhow, that could be the next step, i.e. to provide new function in junction.py to do otherwise (batch mode), thus caching and computing at once all required changes.
But for now the approach I am seeking is to make little iterative changes.
Yes, this is what we have (but tailored for our deployment approach which is slightly different than the one proposed here). We managed, for the same deployment of a wrp with 200 junctions (incl objectspace, policies,...) to reduce the time from 3h30 to 10 minutes using this technique.
Just sharing my experimental code here for now from junctions.py's _check() method yielding 10% improvement:
def _check(isamAppliance, reverseproxy_id, junctionname): ret_obj = isamAppliance.invoke_get("Retrieving the parameters for a single standard or virtual junction", "{0}/{1}/junctions?junctions_id={2}".format(uri, reverseproxy_id, junctionname), requires_modules=requires_modules, requires_version=requires_version, ignore_error=True) # Assume that if 'message' is present, an error occured thus the junction does not exist. if 'message' in ret_obj['data']: return False return True
Away from the office for a while and this needs more testing before I submit it via a PR.
Yes, you'd need to try and catch the error though. I think in case of a http 500, the isam appliance class will throw an exception.
Some more results:
In cases of 100% idempotency, with original _check() (100 junctions):
ibm.isam.set_junctions : Set an array of Junctions -------------------------------------------------------------------------- 160.26s
In cases of 100% idempotency, with revised _check() (100 junctions):
ibm.isam.set_junctions : Set an array of Junctions --------------------------------------------------------------------------- 80.08s
So indeed, a reduction of api call makes a huge difference.
Will continue testing new code for a while and will consider a PR once we are confident it is rock solid in a few weeks.
Each api calls is 500ms. Once you know this, you know how to save time.
Also, think about a way to provide the feedback of what changed or not to ansible. 😉
I tackled this problem a while ago for a gui I was writing rather than for Ansible or python api. It is a little ugly but the most effective way to get the details behind all junctions for a WebSEAL instance was to revert to pdadmin commands and parse the response.
the parsing is ugly but the response format from commandline hasn't changed in many releases. Sending 1 requests to pdadmin for all the junctions will cut the total time taken to a fraction of sending a request per junction.
Hope this is useful
This is definitely the most effective way today.
I tackled this problem a while ago for a gui I was writing rather than for Ansible or python api. It is a little ugly but the most effective way to get the details behind all junctions for a WebSEAL instance was to revert to pdadmin commands and parse the response.
- Get the list of junctions https://{appliance_hostname}/wga/reverseproxy/{reverseproxy_id}/junctions
- Parse the response and create a pdadmin s t webseal-name show /junctionname for each junction
- Send all the junction show commands for a WebSEAL instance in a single POST to pdadmin interface https://{appliance_hostname}/isam/pdadmin/
- parse the response
the parsing is ugly but the response format from commandline hasn't changed in many releases. Sending 1 requests to pdadmin for all the junctions will cut the total time taken to a fraction of sending a request per junction.
Hope this is useful
Hello, would you mind sharing that part of your code? I did write something very similar. And that's actually way faster and consistant than the api. Which is a pity.
Ps: isva 10.0.2.0 added 2 news server settings.
Even better would be if there was a REST endpoint to get all junctions ... https://ibmsecurity.ideas.ibm.com/ideas/ISAM-I-1061
Just sharing here some thoughts about junction code performances.
a) Testing if a junction exists
Using curl against a wrp with only a single junction defined, testing if the junction exists using this URI code takes systematically about 700 ms:
curl -k --user admin@local:**** -H "Content-Type:application/json" -H "Accept: application/json" -X GET https://someappliance/wga/reverseproxy/default/junctions
And using curl against the same wrp, testing if the junction exists using this URI code takes systematically about 500 ms (irrespective if it exists or not):
curl -k --user admin@local:**** -H "Content-Type:application/json" -H "Accept: application/json" -X GET https://someappliance/wga/reverseproxy/default/junctions?junctions_id=/isam
200 ms of difference can be quite a lot when trying to detect presence of 50 junctions defined on a particular WRP, and even more if it is clustered.
So I am planning some refactoring/performance testing of the method _check() (inside junctions.py) to use the second form (junctions?junctions_id) of the restapi query than the first. My only concern was if testing for a non-existing junction using the second restapi format when the junction does not exist would fill the wrp server logs with errors messages (such as DPWWM1346E) but it does not.