ML4GW / aframev2

Detecting binary black hole mergers in LIGO with neural networks
MIT License
5 stars 16 forks source link

Condor tasks say they're finished, but aren't #263

Open wbenoit26 opened 1 month ago

wbenoit26 commented 1 month ago

I had this come up a couple of times in the most recent run I did: a condor-based task, like DeployTestingWaveforms or DeployInferLocal, would say that it was complete; i.e., the polling would show that out of X jobs, X had finished; but when the subsequent task tried to start, it would find that there were missing dependencies and the pipeline would fail. Restarting the pipeline caused the initial task to notice that there were actually some incomplete jobs, which it would run, and then the rest of the pipeline would run as normal.