TACC / launcher

A simple utility for executing multiple sequential or multi-threaded applications in a single multi-node batch job
MIT License
63 stars 33 forks source link

launcher for stampede2 #47

Open schristley opened 6 years ago

schristley commented 6 years ago

I'm porting my applications from TACC's stampede to stampede2 system. I'm using launcher 3.0.1 and getting these errors on stderr:

Ncat: Invalid -d delay "c405-132" (must be greater than 0). QUITTING.
Ncat: Invalid -d delay "c405-132" (must be greater than 0). QUITTING.
Ncat: Invalid -d delay "c405-132" (must be greater than 0). QUITTING.
Ncat: Invalid -d delay "c405-132" (must be greater than 0). QUITTING.
Ncat: Invalid -d delay "c405-132" (must be greater than 0). QUITTING.

and stdout seems to indicate problem talking to task server

------------- SUMMARY ---------------
   Number of hosts:    1
   Working directory:  /scratch/01114/vdj/vdj/job-59884011666018791-242ac11c-0001-007-igblast_test
   Processes per host: 3
   Total processes:    3
   Total jobs:         3
   Scheduling method:  dynamic

-------------------------------------
Launcher: Starting parallel tasks...
WARNING: No response from dynamic task server. Retrying...
WARNING: No response from dynamic task server. Retrying...
WARNING: No response from dynamic task server. Retrying...
WARNING: No response from dynamic task server. Retrying...
WARNING: No response from dynamic task server. Retrying...
schristley commented 6 years ago

I tried using the system module instead, which seems to be a more recent version and that is working better, the jobs are running now. Still getting a couple errors but not sure if it's affecting anything.

/opt/apps/launcher/launcher-3.1/paramrun: line 171: [: -eq: unary operator expected
/opt/apps/launcher/launcher-3.1/paramrun: line 211: [: -eq: unary operator expected
lwilson commented 6 years ago

The first issue is related to a change in netcat, which was noticed on LS5 and is now the case on S2. I believe the current master branch has this resolved.

For the second error, I'd suggest submitting a TACC ticket. I'm not at TACC anymore and don't currently have access to the systems to diagnose.

johnfonner commented 6 years ago

Those last two errors are from if statements that expect a variable called LAUNCHER_BIND to be non null. They look harmless, but also not hard to rewrite them more defensively.

schristley commented 6 years ago

Should the environment variables be setup different for stampede2? Supposedly each node has 63 cores.

Normally I define LAUNCHER_PPN to be the number of process to run simultaneously on a node, but I'm seeing weird behavior. I run with LAUNCHER_PPN=8, connect to the node and run top and it shows each igblastn process using about 50% CPU. Here is a snapshot:

   PID USER      PR  NI    VIRT    RES    SHR S  %CPU %MEM     TIME+ COMMAND                                                                                                                          
237141 vdj       20   0  415464  68828  15604 S  57.8  0.1   1:40.66 igblastn                                                                                                                         
237079 vdj       20   0  415616  71296  16008 S  56.9  0.1   1:53.81 igblastn                                                                                                                         
237156 vdj       20   0  415584  68968  15928 S  56.9  0.1   1:23.45 igblastn                                                                                                                         
237125 vdj       20   0  415452  68636  15644 S  56.6  0.1   1:47.74 igblastn                                                                                                                         
237109 vdj       20   0  415516  74856  15808 S  55.9  0.1   1:50.45 igblastn                                                                                                                         
237033 vdj       20   0  415584  71752  15876 S  51.6  0.1   2:27.99 igblastn                                                                                                                         
237298 vdj       20   0  415572  64628  15556 S  51.6  0.1   0:14.28 igblastn                                                                                                                         

Now if I set LAUNCHER_PPN=40, then I have 40 igblastn process but they are only using 10% CPU each?! It's like they are throttled, the CPU% is exactly 5x less, the same multiple that I increased LAUNCHER_PPN by.

   PID USER      PR  NI    VIRT    RES    SHR S  %CPU %MEM     TIME+ COMMAND                                                                                                                          
180938 vdj       20   0  415532  73020  15668 S  10.9  0.1   1:16.09 igblastn                                                                                                                         
180847 vdj       20   0  415488  59732  15808 S  10.6  0.1   1:16.03 igblastn                                                                                                                         
180861 vdj       20   0  415584  72248  15812 S  10.6  0.1   1:15.88 igblastn                                                                                                                         
180866 vdj       20   0  415540  67872  15796 S  10.6  0.1   1:15.58 igblastn                                                                                                                         
180899 vdj       20   0  415520  59044  15808 S  10.6  0.1   1:15.82 igblastn                                                                                                                         
180903 vdj       20   0  415692  67168  15884 S  10.6  0.1   1:15.58 igblastn                                                                                                                         
180912 vdj       20   0  415648  67900  15808 S  10.6  0.1   1:16.68 igblastn                                                                                                                         

It shouldn't be an I/O thing because the files that igblastn processes are small, ~3MB input and ~40MB output.

If I run a single igblastn, it uses 400% CPU, i.e. 8x faster than LAUNCHER_PPN=8.

johnfonner commented 6 years ago

That looks suspiciously like an igblastn specific thing. Are manually setting -num_threads? It looks like by default, igblast uses 4 threads, which explains why a single igblastn is using 400% CPU.

On Stampede2, the normal queue has Intel Xeon Phi processors with 68 cores. The skx-normal queue has Skylake nodes with 48 cores. Maybe setting LAUNCHER_BIND=1 on the Xeon Phi nodes will help. Launcher isn't throttling the CPU, but depending on how the tasks are being distributed on the processor, it could be exposing bottlenecks in memory or something. Do you see the same thing on the Skylake nodess?

schristley commented 6 years ago

I tried on the Skylake nodes and it works as expected, with 8 parallel process each are using 400% CPU. So the issue does seem specific to the KNL nodes.

schristley commented 6 years ago

Also tried LAUNCHER_BIND=1 for KNL nodes but it produces errors and igblastn isn't even run.