LSSTDESC / rail_base

Base classes for RAIL
MIT License
0 stars 1 forks source link

rail.RailStage.input_iterator calls sys.exit #154

Open joezuntz opened 3 months ago

joezuntz commented 3 months ago

Bug report

Current rail stage.py code has this:

                    color = self.rank + 1 <= total_chunks_needed
                    newcomm = self.comm.Split(color=color, key=self.rank)
                else:
                    color = False
                    newcomm = None
                if color:
                    self.setup_mpi(newcomm)
                else:
                    sys.exit()

This code should not be calling sys.exit. I can't really see why you would want to - just return an empty iterator if there's nothing to iterator over. This breaks a whole bunch of stuff. There's no need to make the new communictor at all.

eacharles commented 3 months ago

159

joselotl commented 3 months ago

If we return an empty iterator then the finishing part of the stage will output an error. This can be fixed but will need to be done in each algorithm individually. Before doing that I'm just wondering if you are having an error in NZDir, I just realized there is a bug in that code. Can you provide the yml of the pipeline that is producing the error so that I can try to reproduce it? @joezuntz

joezuntz commented 3 months ago

I found the error in my code, which was not a pure pipeline but a loop within a pipeline stage over the tomographic bins. It was a minor error on my end, but this issue, in combination with the other one that was recently fixed, was obscuring it, because it was just silently exiting.

joezuntz commented 3 months ago

So there isn't really anything that I can usefully show you to replicate. But if there's an error in NZDir that would be great to fix. So far it seems to be running okay in the TX pipeline.