Open akrherz opened 9 years ago
@akrherz I'm not sure I understand what you're asking, exactly. So I'll explain a few things and then ask a question.
-close
option is specified; orWhat, exactly, is the problem you're encountering? What are the symptoms?
@semmerson Thanks for your response. I thought pqact had a 32 pipe/exec/file limit for number of 'child' (bad word, but I can't think of a more proper one) processes. So if pqact has 32 active things going and LDM receives an additional product that requires a 'new' process, it will look to recycle one of those 32 active things and close its PIPE/IO to that process. If that process is still doing computation, then pqact could effectively launch many such processes. This was sort of based on our discussion we had about the need to split NEXRAD2 into multiple pqacts to get <32 radars per pqact.
Lets try an example pqact entry:
IDS|DDPLUS /p((MAV|MEX|MET)...)
PIPE -close -strip python pyWWA/parsers/split_mav.py
When a rapid succession of MOS products arrive on LDM, I have seen pqact have tens to perhaps 100s of these split_mav.py processes going at once. I would like to have the option of pqact -wait
ing for this split_mav.py
to complete before attempting to fire up another. Yes, I could modify this code to many it accept multiple products in one iteration, but this is a modest example. The satellite data processor I have is a more compute intensive issue I have.
The ulimit option probably won't work as I don't want to choke down on ldmd
as well, but perhaps I could make that option work. I also have other LDM processes running that need many file descriptors as well, which would be tricky to account for.
@akrherz 'Child' process is exactly the right word.
The maximum number of open file-descriptors is revealed by the command ulimit -u
. The minimum value is 32 according to the Unix standard.
Python can be orders of magnitude slower than C. See this revealing graphic.
Why do you want to limit the number of active split_mav.py
processes?
@semmerson Seriously? You deem it necessary to point out that python is slower than C? I am some sort of idiot here? Just close the ticket, enough.
@akrherz I'm sorry I upset you. I'm just trying to understand your problem and consider all options. I only pointed-out that Python is relatively slow because the number of LDM decoders in your scenario will depend on their speed; consequently, faster decoders is an alternative solution to your problem. It might not be the best solution, however. I'm certainly open to a -wait
option for the PIPE
action; I just don't know if its the best solution due to its side-effects -- which is why I'm writing out loud (so to speak).
I assume the number of split_mav_py
processes on your system can be a problem. Is this because it reduces interactivity to an unacceptable level?
@semmerson The temp file locking/sleep code that chiz wrote because of this issue was against GEMPAK processes, so this issue is not limited to Python or other 'slow' processes.
I did some more investigation of this and see there is no 32 FILE/PIPE/EXEC limit as a thought, okay, I am educated on that point!
I found our previous email discussion on this, back in Nov 23 2011, the request was to add -wait to PIPE so that exit statuses would properly be reported via LDM. Your last email on it was
It wouldn't be different -- and that "-wait" option for EXEC is very
dangerous. It's only redeeming quality is that EXEC actions tend to be
locally contrived ones -- so the user has explicit knowledge of how long
they'll take -- whereas the PIPE actions are often ((if not usually) to
third-party decoders that the user didn't write and doesn't know how
long they'll take.
@semmerson I had a bad thought about this. Would adding such an option cause pqact to 'block' as it waited for that -wait
process to finish, so that it can PIPE a new product to it? It would not be able to do any other actions until this -wait
process finished? If so, this would not be a good idea!
My comment on the difference between a -wait
option for EXEC
and PIPE
actions is correct.
Yes, a -wait
option for the PIPE
action would cause pqact(1)
to block and not process any more data-products in the interim. This is the unwanted side-effect I mentioned previously. It might be possible to mitigate this effect by restricting the pqact(1)
process to only those data-products for which the -wait
option would be appropriate. (Yet another thing to consider.)
What, however, is the problem caused by a relatively large number of split_mav_py
processes on the system in question?
@semmerson Ah, yeah, I am not getting warm fuzzies about this request anymore.
When each split_mav.py
process makes web service API requests to a service provider that only allows a certain number of simultaneous requests, I need to be able to limit the number of simultaneous split_mav.py
processes that can be active at any time. In the case of GEMPAK, there are shared memory issues / bugs with number of simultaneous GEMPAK processes that can run at one time under one user.
I am fine with rejecting this RFE given the blocking issue you just commented on, that would be a problem.
If the problem is that a service external to the LDM can only handle a limited number of requests, then a possible solution would be to have the relevant decoder use a semaphore that was initialized to this number. Can this be done in Python?
@semmerson Sure.
That might be the best solution, then, because it wouldn't cause pqact(1)
to block yet access by the decoders to the limited resource would be controlled.
Hi Unidata/Steve,
I would really like to see a "-wait" or equivalent option added to PIPE/EXEC actions to effectively limit the number of processes one pqact could have active at one time. This flag would cause pqact to not recycle the slot until that process has exited. I think I discussed this with you many years ago and you were not enthusiastic about it as misbehaving/naughty processes could wedge up and effectively jam up pqact as well as pqact waits for these processes to exit...
The issue is that any process that has this one product in execution model, could effectively DOS a system as pqact exec's off one process per product received. Starting up LDM after a considerable downtime is one example. Another is some products that come in rapid succession...
Currently, users have two options:
I personally loathe option 2 as having potentially hundreds of scripts writing lock files and sleeping is a race condition waiting to happen. I have written lots of processes that do option 1, but not all are well suited for it. For example, satellite data processors.
A nice aspect of this is that pqact could then log non-zero exit statuses from these '-wait' processes, which would help users debugging this. Perhaps some other logging would already kick in, if pqact had no available slots over some given about of time, I am unsure of that one.
I think a reasonable exception is for '-wait' to imply a '-close' as well. I'd be happy to provide feedback if there are other edge cases you anticipate. Thanks for your consideration :)