benedictpaten / jobTree

Python based pipeline management software for clusters (but checkout toil: https://github.com/BD2KGenomics/toil, its successor)
MIT License
24 stars 18 forks source link

periodically hangs entire parasol hub when listing jobs #28

Open joelarmstrong opened 9 years ago

joelarmstrong commented 9 years ago

This has been a problem for a while, but I'm just putting an issue up so I remember to fix this somehow.

When parasol has more than a million or so jobs queued, like now, the periodic "parasol -extended list jobs" command that jobTree runs hangs the entire parasol hub process for a couple minutes while it gets a listing of every job. This sucks, since it means that the cluster nodes start to go idle waiting for work, since the hub can't issue new jobs while it it's busy sending the list of queued jobs to jobTree. This gets even worse when there are a few jobTrees running; the cluster sometimes sits completely idle for several minutes.

We (read: I) should try to find some way around listing every job, maybe by looking to see if there's a way we can get the same information, but limited to just the jobTree batch rather than all batches. If there isn't a way currently, maybe modify parasol to include that functionality.

diekhans commented 9 years ago

This is another reason to have each job tree job do multiple cactus alignments. If parasol can't handle this no other scheduler can

... Sent from my computer phone

-----Original Message----- From: Joel Armstrong notifications@github.com To: benedictpaten/jobTree jobTree@noreply.github.com Sent: Mon, 16 Mar 2015 7:19 PM Subject: [jobTree] periodically hangs entire parasol hub when listing jobs (#28)

This has been a problem for a while, but I'm just putting an issue up so I remember to fix this somehow.

When parasol has more than a million or so jobs queued, like now, the periodic "parasol -extended list jobs" command that jobTree runs hangs the entire parasol hub process for a couple minutes while it gets a listing of every job. This sucks, since it means that the cluster nodes start to go idle waiting for work, since the hub can't issue new jobs while it it's busy sending the list of queued jobs to jobTree. This gets even worse when there are a few jobTrees running; the cluster sometimes sits completely idle for several minutes.

We (read: I) should try to find some way around listing every job, maybe by looking to see if there's a way we can get the same information, but limited to just the jobTree batch rather than all batches. If there isn't a way currently, maybe modify parasol to include that functionality.


Reply to this email directly or view it on GitHub: https://github.com/benedictpaten/jobTree/issues/28

benedictpaten commented 9 years ago

The problem is parasol does not provide means to only list the jobs of a given user. Adding Galt.

On Mon, Mar 16, 2015 at 5:09 PM, Mark Diekhans notifications@github.com wrote:

This is another reason to have each job tree job do multiple cactus alignments. If parasol can't handle this no other scheduler can

... Sent from my computer phone

-----Original Message----- From: Joel Armstrong notifications@github.com To: benedictpaten/jobTree jobTree@noreply.github.com Sent: Mon, 16 Mar 2015 7:19 PM Subject: [jobTree] periodically hangs entire parasol hub when listing jobs (#28)

This has been a problem for a while, but I'm just putting an issue up so I remember to fix this somehow.

When parasol has more than a million or so jobs queued, like now, the periodic "parasol -extended list jobs" command that jobTree runs hangs the entire parasol hub process for a couple minutes while it gets a listing of every job. This sucks, since it means that the cluster nodes start to go idle waiting for work, since the hub can't issue new jobs while it it's busy sending the list of queued jobs to jobTree. This gets even worse when there are a few jobTrees running; the cluster sometimes sits completely idle for several minutes.

We (read: I) should try to find some way around listing every job, maybe by looking to see if there's a way we can get the same information, but limited to just the jobTree batch rather than all batches. If there isn't a way currently, maybe modify parasol to include that functionality.


Reply to this email directly or view it on GitHub: https://github.com/benedictpaten/jobTree/issues/28

— Reply to this email directly or view it on GitHub https://github.com/benedictpaten/jobTree/issues/28#issuecomment-82000755 .