geodesymiami / results

0 stars 0 forks source link

CDtools 1.2 still slow on stampede2 #15

Open falkamelung opened 1 year ago

falkamelung commented 1 year ago

Hi Chris, here an example for distribute.bash still being slow on Stampede2. When I copy a 3GB directory (containing several subdirectories) to /tmp it takes 1 minute. For comparison on Frontera it takes 3 seconds. However, when I run it a second time it takes only 3 seconds. So it has problems when running distribute.bash for the first time only.

I also tried the old version of CDTOOL and I observed the same thing. Copying data takes very long. But when I remove them and copy again it goes very fast. Maybe it writes data to some other /tmp-type directory?

Thank you Falk

Stampede2

du -sh /scratch/05861/tg851601/qMaunaLoaSenDT87/interferograms/20220315_20220327
3.4G    /scratch/05861/tg851601/qMaunaLoaSenDT87/interferograms/20220315_20220327

time distribute.bash /scratch/05861/tg851601/qMaunaLoaSenDT87/interferograms/20220315_20220327
Going to distribute directory /scratch/05861/tg851601/qMaunaLoaSenDT87/interferograms/20220315_20220327 to /tmp
Running distribution functions now...
srun: cluster configuration lacks support for cpu binding
/scratch/05861/tg851601/qMaunaLoaSenDT87/interferograms/20220315_20220327 has been copied to /tmp/20220315_20220327 on each compute node.
Please remember to copy the required files out of /tmp before the job finishes!
Done.

real    1m14.603s
user    0m0.140s
sys 0m0.157s

rm -r /tmp/20220315_20220327
time distribute.bash /scratch/05861/tg851601/qMaunaLoaSenDT87/interferograms/20220315_20220327
Going to distribute directory /scratch/05861/tg851601/qMaunaLoaSenDT87/interferograms/20220315_20220327 to /tmp
Running distribution functions now...
srun: cluster configuration lacks support for cpu binding
/scratch/05861/tg851601/qMaunaLoaSenDT87/interferograms/20220315_20220327 has been copied to /tmp/20220315_20220327 on each compute node.
Please remember to copy the required files out of /tmp before the job finishes!
Done.

real    0m2.999s
user    0m0.150s
sys 0m0.143s

Frontera

du -sh /scratch2/05861/tg851601/MaunaLoaSenDT87/interferograms/20220315_20220327
3.5G    /scratch2/05861/tg851601/MaunaLoaSenDT87/interferograms/20220315_20220327

time distribute.bash /scratch2/05861/tg851601/MaunaLoaSenDT87/interferograms/20220315_20220327
Going to distribute directory /scratch2/05861/tg851601/MaunaLoaSenDT87/interferograms/20220315_20220327 to /tmp
Running distribution functions now...
srun: cluster configuration lacks support for cpu binding
/scratch2/05861/tg851601/MaunaLoaSenDT87/interferograms/20220315_20220327 has been copied to /tmp/20220315_20220327 on each compute node.
Please remember to copy the required files out of /tmp before the job finishes!
Done.

real    0m2.736s
user    0m0.103s
sys 0m0.113s

I am using this version of distribute.bash:

export CDTools=/home1/apps/CDTools/1.2
export PATH=${PATH}:${CDTools}/bin
export PATH=${PATH}:/usr/local/bin

XXXXXXXXXXXXXXXXXXXXXXXX Using mpiexec of the old version of CDTOO I observe the same thing. Copying data for the first time takes very long but when I remove them and copy again it goes very fast. XXXXXXXXXXXXXXXXXXXXXXXX

time mpiexec -np 1 -hosts c506-062 -ppn 1 distribute_tmp_s2 /scratch/05861/tg851601/qMaunaLoaSenDT87/interferograms/20220315_20220327 /tmp/20220315_20220327

/scratch/05861/tg851601/qMaunaLoaSenDT87/interferograms/20220315_20220327 has been copied to /tmp/20220315_20220327 on each compute node.
Please remember to copy the requried files out of /tmp before the job finishes!

real    1m46.057s
user    0m0.052s
sys 0m6.590s
c506-062[skx](1009)$ rm -rf  /tmp/20220315_20220327
c506-062[skx](1010)$ time mpiexec -np 1 -hosts c506-062 -ppn 1 distribute_tmp_s2 /scratch/05861/tg851601/qMaunaLoaSenDT87/interferograms/20220315_20220327 /tmp/20220315_20220327
/scratch/05861/tg851601/qMaunaLoaSenDT87/interferograms/20220315_20220327 has been copied to /tmp/20220315_20220327 on each compute node.
Please remember to copy the requried files out of /tmp before the job finishes!

real    0m3.050s
user    0m0.037s
sys 0m2.812s