Closed nbest937 closed 12 years ago
the wth_gen script is set up to do a run in 1-8 processes simultaneously as is right? does it make sense to further expand it to more processes with make? how are you dividing these processes?
On Tue, Jun 12, 2012 at 4:58 PM, Neil Best < reply@reply.github.com
wrote:
I have make set up to run multiple nc_wth_gen processes simultaneously. It seems like these processes are trying to write diagnostic output to the same location and the program throws an error when a file (or directory?) that it wants to create is alrady there. We should consider how to avoid this failure mode.
make -k -j5 -l6 wth_gen make --directory=wth_gen all make[1]: Entering directory `/scratch/local/isi-mip-input/wth_gen' ./nc_wth_gen 1950 1980 /scratch/local/isi-mip-input/wth_gen_input/HadGEM2-ES /scratch/local/isi-mip-input/grid/HadGEM2-ES GENERIC1.WTH 2 1 > log/GENERIC1.LOG ./nc_wth_gen 1980 2010 /scratch/local/isi-mip-input/wth_gen_input/HadGEM2-ES /scratch/local/isi-mip-input/grid/HadGEM2-ES GENERIC2.WTH 2 2 > log/GENERIC2.LOG ./nc_wth_gen 2010 2040 /scratch/local/isi-mip-input/wth_gen_input/HadGEM2-ES /scratch/local/isi-mip-input/grid/HadGEM2-ES GENERIC3.WTH 2 3 > log/GENERIC3.LOG ./nc_wth_gen 2040 2070 /scratch/local/isi-mip-input/wth_gen_input/HadGEM2-ES /scratch/local/isi-mip-input/grid/HadGEM2-ES GENERIC4.WTH 2 4 > log/GENERIC4.LOG ./nc_wth_gen 2070 2100 /scratch/local/isi-mip-input/wth_gen_input/HadGEM2-ES /scratch/local/isi-mip-input/grid/HadGEM2-ES GENERIC5.WTH 2 5 > log/GENERIC5.LOG mkdir: cannot create directory `/scratch/local/isi-mip-input/grid/HadGEM2-ES/precip_diag.1950_1980.p1.txt': File exists mkdir: cannot create directory `/scratch/local/isi-mip-input/grid/HadGEM2-ES/precip_diag.1980_2010.p2.txt': File exists mkdir: cannot create directory `/scratch/local/isi-mip-input/grid/HadGEM2-ES/precip_diag.2010_2040.p3.txt': File exists mkdir: cannot create directory `/scratch/local/isi-mip-input/grid/HadGEM2-ES/precip_diag.2040_2070.p4.txt': File exists mkdir: cannot create directory `/scratch/local/isi-mip-input/grid/HadGEM2-ES/precip_diag.2070_2100.p5.txt': File exists mkdir: cannot create directory `/scratch/local/isi-mip-input/grid/HadGEM2-ES/temp_diag.1950_1980.p1.txt': File exists mkdir: cannot create directory `/scratch/local/isi-mip-input/grid/HadGEM2-ES/temp_diag.1980_2010.p2.txt': File exists mkdir: cannot create directory `/scratch/local/isi-mip-input/grid/HadGEM2-ES/temp_diag.2010_2040.p3.txt': File exists mkdir: cannot create directory `/scratch/local/isi-mip-input/grid/HadGEM2-ES/temp_diag.2040_2070.p4.txt': File exists mkdir: cannot create directory `/scratch/local/isi-mip-input/grid/HadGEM2-ES/temp_diag.2070_2100.p5.txt': File exists At line 274 of file nc_wth_gen.f90 Fortran runtime error: No such file or directory make[1]: *** [GENERIC4.WTH] Error 2 STOP 1 make[1]: *** [GENERIC5.WTH] Error 1 mkdir: cannot create directory `/scratch/local/isi-mip-input/grid/HadGEM2-ES/precip_diag.1950_1980.p1.txt': File exists mkdir: cannot create directory `/scratch/local/isi-mip-input/grid/HadGEM2-ES/precip_diag.1980_2010.p2.txt': File exists mkdir: cannot create directory `/scratch/local/isi-mip-input/grid/HadGEM2-ES/precip_diag.2010_2040.p3.txt': File exists mkdir: cannot create directory `/scratch/local/isi-mip-input/grid/HadGEM2-ES/precip_diag.2040_2070.p4.txt': File exists mkdir: cannot create directory `/scratch/local/isi-mip-input/grid/HadGEM2-ES/precip_diag.2070_2100.p5.txt': File exists mkdir: cannot create directory `/scratch/local/isi-mip-input/grid/HadGEM2-ES/temp_diag.1950_1980.p1.txt': File exists mkdir: cannot create directory `/scratch/local/isi-mip-input/grid/HadGEM2-ES/temp_diag.1980_2010.p2.txt': File exists mkdir: cannot create directory `/scratch/local/isi-mip-input/grid/HadGEM2-ES/temp_diag.2010_2040.p3.txt': File exists mkdir: cannot create directory `/scratch/local/isi-mip-input/grid/HadGEM2-ES/temp_diag.2040_2070.p4.txt': File exists mkdir: cannot create directory `/scratch/local/isi-mip-input/grid/HadGEM2-ES/temp_diag.2070_2100.p5.txt': File exists At line 274 of file nc_wth_gen.f90 Fortran runtime error: No such file or directory make[1]: *** [GENERIC3.WTH] Error 2
Somehow two nc_wth_gen processes are still running. I would have thought that only the first one to create these diag. files would survive. Maybe the attempt to create is only triggered by some specific condition that did not occur in the second process. I will update this when more output or insight is available.
Reply to this email directly or view it on GitHub: https://github.com/RDCEP/wth_gen/issues/1
Joshua W. Elliott Research Scientist and Fellow Computation Institute 5735 S. Ellis Ave. Chicago, IL 60637 Tel: 773-834-6812; Fax: 773-834-6818 E-mail: jelliott@ci.uchicago.edu Links: Personal websitehttps://sites.google.com/site/joshuawrightelliott/; SSRN papers http://ssrn.com/author=1655092;
Thanks, Joshua. When I read your comment something clicked in my head. The wth_gen/run script uses qsub to parallelize the job on the cluster, but I am using make to parallelize it on a single node. I interpreted the $n_procs parameter to set the number of cores that each invocation of nc_wth_gen would use, but rather it's the total number of jobs that will be running. Because we have 5 time periods that number should be 5 it seems. This explains why 2 calls were successful (well, more successful -- see issue 2). I'm trying it now. If GENERIC[345].WTH files start appearing then I can close this issue.
I have make set up to run multiple nc_wth_gen processes simultaneously. It seems like these processes are trying to write diagnostic output to the same location and the program throws an error when a file (or directory?) that it wants to create is alrady there. We should consider how to avoid this failure mode.
Somehow two nc_wth_gen processes are still running. I would have thought that only the first one to create these diag. files would survive. Maybe the attempt to create is only triggered by some specific condition that did not occur in the second process. I will update this when more output or insight is available.