Closed jluethi closed 2 years ago
Also, @tcompa when are running workflows supposed to appear on the parsl monitor? Neither the failed on nor the currently running workflow shows up on the monitoring for me. I restarted the monitoring service and this stays the same. The monitoring service was already running though when I submitted the jobs. Does that interfere with things?
Hmm, weird. With 8 pyramid levels, the experiment ran through. Don't really understand why it fails with 9, but works with 8...
I'll have a look now at the many-levels error.
Also, @tcompa when are running workflows supposed to appear on the parsl monitor?
As soon as workflow_apply
submits jobs, i.e. as soon as you see them on squeue
Neither the failed on nor the currently running workflow shows up on the monitoring for me. I restarted the monitoring service and this stays the same. The monitoring service was already running though when I submitted the jobs. Does that interfere with things?
You are likely hitting a known error, which was fixed with a recent PR (https://github.com/Parsl/parsl/pull/2324) but not yet available in Fractal's parsl version. FYI, the bug is that parsl-visualize
creates a wrong db: https://github.com/Parsl/parsl/issues/2266.
Quick workaround:
ps aux | grep fractal
(well, only the ones for your user);runinfo
folder and monitoring.db
;parsl-visualize
;squeue
), you can safely check the monitoring;monitoring.db
, you can keep the monitoring active 100% of the time, and submit new workflows as you like.More robust solution: we should have our own parsl fork, with the patches we need. At the moment we are installing Jacopo's fork, which however branches off their dev branch, rather than from their stable 1.2 version.
Quick check: are you sure that you are using a 2x
coarsening? Because if it were 3x
, 8 or 9 levels would be close to the maximum possible value, see e.g.
0 2160*8=17280
1 5760
2 1920
3 640
4 213
5 71
6 23
7 7
8 2
Anyway, I'm testing this and I am adding an explicit check during pyramid creation.
My bad @tcompa! I mistakenly had the coarsening at 4 (which actually is quite a bad default, as it hurts performance of visualization quite a bit I think, but that is to be tested => I'll report back when I have rerun it with actual pyramid levels of 2).
Looking forward to having a unified pipeline file, because now I sometimes forget to change some parameters in one of the settings files after I pull in changes again from the repo.
So the error is correct then. We could think about whether there is a way to check for this early on, but I think the pipeline fails "fast enough" so that it isn't a huge loss of time
You are likely hitting a known error
Ok, it's not urgent to create workarounds for me at the moment, I'm looking forward to this fix then :)
I added more explicit checks in b009393bc8588ad3f195d4731a54ad1e24761321 and ed9c62ea4c408ef4990b654d08a54cbffb18e54d. Closing this issue.
I've been trying to rerun the 23 well dataset with 9 pyramid levels and it failed with the following error message.
According to my calculations, 9 pyramid levels shouldn't lead to a chunk size of 0 anywhere. The smallest dimension of a whole well would be: 80x76 (approx 76, could be some rounding because it's not evenly divisible by 2).
I'm trying to rerun with 8 levels. If that doesn't work, I'll try with 5 levels where the pyramid sizes remain integers without rounding to see if the issue is there.