Closed taobrienlbl closed 5 years ago
I added some diagnostics print statements, and the stall has to do with teca_algorithm_executive::initialize()
. The code gets just to the point of calling that routine, but strangely stalls before executing anything within: even when I modify teca_algorithm_executive::initialize()
to do a std::cout
as the very first line. I added std::flush
to be sure that the I/O isn't being buffered, and still nothing.
The code does reach these print statements when running with mpirun -n 1
, so this still has something to do with MPI somehow. I'm a bit perplexed...
OK, I can see that there are a couple places where mpi may still come into the picture.
While we work on this, can you try your class the following change to the table reduce setup?
self.mr.set_enable_mpi(0)
self.mr.set_thread_pool_size(1)
self.mr.set_verbose(0)
self.mr.set_bind_threads(0)
verbose flag causes MPI to be used, so if the above doesn't deadlock, that is likely the culprit, and may be all that is needed.
thanks for the suggestion - I just tried this, but it still stalls with these settings
I was misinterpreting the print diagnostics yesterday: it was hanging before the initialize()
function. Ultimately, I tracked down that it was hanging inside one of the algorithm's get_output_metadata()
routines. When I commented out MPI usage (see my comment in #201 about this), the code proceeds without hanging.
This will likely be fixed by #183 .
ah, yes, you have a cf reader. I had overlooked that. this issue will be solved by #183
by the way, there is a potentially substantial performance issue lurking in the code above. generally we want to avoid creating a reader and opening a dataset over and over again. caching machinery might hide some of this, even so i/o doesn't scale up, metadata ops (such as opening files) are often the most expensive on Lustre.
ideally we could open the dataset once upstream the parts of the pipeline that runs repeatedly and make requests for the specific data we want to pull through the pipeline. There are a number of examples of this pattern, teca_tc_wind_radii
, teca_bayesain_ar_detect
are probably the best. I'm not sure about the details of what you're doing, but we should see if you could refactor it that way.
Agreed on the potential performance issue. I had considered simply passing in the mesh directly, but it looks like performance here is acceptable enough to not warrant further tweaks. This is being done during the MCMC training stage, which isn't terribly computationally intensive. I could be wrong though--this might start to impact performance when I do the larger-scale training on Cori in May; I'll keep an eye out for that.
@taobrienlbl tested #203 with your script with the following changes and it did not deadlock
smic:~/work/teca/teca$diff -u test_orig.py test.py
--- test_orig.py 2019-05-03 09:47:03.167767712 -0700
+++ test.py 2019-04-30 16:46:14.751119441 -0700
@@ -53,6 +53,7 @@
# create the reader component
#*****************************
self.mesh_data_reader = teca.teca_cf_reader.New()
+ self.mesh_data_reader.set_communicator(MPI.COMM_SELF)
# set the filenames
self.mesh_data_reader.set_files_regex(mesh_data_regex)
# indicate that longitude is periodic
@@ -71,6 +72,7 @@
# create the latitude damper component
#**************************************
self.damp = teca.teca_latitude_damper.New()
+ self.damp.set_communicator(MPI.COMM_SELF)
# connect it to the reader
self.damp.set_input_connection(self.mesh_data_reader.get_output_port())
# set the variable for damping
@@ -80,6 +82,7 @@
# create the segmentation component
#**************************************
self.seg = teca.teca_binary_segmentation.New()
+ self.seg.set_communicator(MPI.COMM_SELF)
# connect it to the latitude dampber
self.seg.set_input_connection(self.damp.get_output_port())
# set the variable work on
@@ -93,6 +96,7 @@
# create the connected component algorithm
#******************************************
self.cc = teca.teca_connected_components.New()
+ self.cc.set_communicator(MPI.COMM_SELF)
# connect it to the segmentation algorithm
self.cc.set_input_connection(self.seg.get_output_port())
# set the variable name from the segmentation algorithm
@@ -104,6 +108,7 @@
# create the component area calculator
#**************************************
self.ca = teca.teca_2d_component_area.New()
+ self.ca.set_communicator(MPI.COMM_SELF)
# connect it to the connected-component algorithm
self.ca.set_input_connection(self.cc.get_output_port())
# set the variable name from the connected-component algorithm
@@ -116,6 +121,7 @@
# create the component area filter
#**********************************
self.caf = teca.teca_component_area_filter.New()
+ self.caf.set_communicator(MPI.COMM_SELF)
# connect it to the component area algorithm
self.caf.set_input_connection(self.ca.get_output_port())
# set the variable name from the connected component algorithm
@@ -127,6 +133,7 @@
# define an instance of a custom variable counter
self.ccnt = TECAComponentCount()
self.pa = teca.teca_programmable_algorithm.New()
+ self.pa.set_communicator(MPI.COMM_SELF)
# connect it to the component area filter
self.pa.set_number_of_input_connections(1)
self.pa.set_input_connection(self.caf.get_output_port())
@@ -137,17 +144,15 @@
# create the table reduction and sorting algorithm
#**************************************************
self.mr = teca.teca_table_reduce.New()
+ self.mr.set_communicator(MPI.COMM_SELF)
# connect the table reduction to the connected component counter
self.mr.set_input_connection(self.pa.get_output_port())
- # turn off MPI
- self.mr.set_enable_mpi(0)
- # indicate that the thread pool size should be automatically inferred
- #self.mr.set_thread_pool_size(-1)
- self.mr.set_thread_pool_size(0)
+ self.mr.set_thread_pool_size(-1)
self.mr.set_verbose(1)
# define a table sorter
self.ts = teca.teca_table_sort.New()
+ self.ts.set_communicator(MPI.COMM_SELF)
# connect it to the table reduction
self.ts.set_input_connection(self.mr.get_output_port())
# sort on the timestep, so that the areas are ordered the same as the
@@ -160,7 +165,6 @@
self.dsc = teca.teca_dataset_capture.New()
self.dsc.set_input_connection(self.ts.get_output_port())
-
def count(self,
relative_value_threshold = 0.85,
hwhm_latitude = 30.0,
@@ -168,7 +172,6 @@
filter_center = 0.0):
""" Sets the parameters for the TECA pipeline, executes the pipline, and counts the connected components. """
-
# set the pipeline parameters
self.seg.set_low_threshold_value(relative_value_threshold*100)
self.damp.set_half_width_at_half_max(hwhm_latitude)
@@ -191,6 +194,6 @@
comm = MPI.COMM_WORLD
rank = comm.Get_rank()
- my_counter = TECAMERRAARCounter("MERRA_test/ARTMIP_MERRA_2D_2017021.*\.nc$")
+ my_counter = TECAMERRAARCounter("/work2/data/teca/for_teca_data_svn/MERRA2/ARTMIP_MERRA_2D_20170218_.*\.nc$")
if rank == 0:
print(my_counter.count())
In trying to run a TECA pipeline that has MPI disabled (i.e.,
teca_table_reduce.set_enable_mpi(0)
has been used), the pipeline freezes when usingmpirun
for more than one task.The above
mpirun -n 1
command runs w/o issue.The above
mpirun -n 2
command stalls when the algorithm's.update()
function is called. The stall appears to happen early in TECA's pipeline; if I give an invalid regex, the pipeline immediately fails for thempirun -n 1
call, but it still stalls for the-n 2
call.The script contents follow.
TECAMERRAARCounter.py: