Closed ghost closed 3 years ago
You have a limit of two jobs per tmp dir. Since your tmp dir stagger limit of four is greater than that, it won't do anything.
Side note, when filing this issue were you not provided with a template to fill out as shown below?
You have a limit of two jobs per tmp dir. Since your tmp dir stagger limit of four is greater than that, it won't do anything.
Side note, when filing this issue were you not provided with a template to fill out as shown below?
tmpdir_stagger_phase_major: 4 tmpdir_stagger_phase_minor: 1 # Optional: default is 1 tmpdir_stagger_phase_limit: 2
# Don't run more than this many jobs at a time on a single temp dir.
# Increase for staggered plotting by chia, leave at 1 for madmax sequential plotting
tmpdir_max_jobs: 2
# Don't run more than this many jobs at a time in total.
# Increase for staggered plotting by chia, leave at 1 for madmax sequential plotting
global_max_jobs: 8
still not work with 4:0, 4:1.. etc..
You still have the phase limit set the same as the overall limit. tmpdir_max_jobs
means that any individual tmpdir can have a maximum of 2 jobs in any phase. tmpdir_stagger_phase_limit
says that any individual tmpdir can have a maximum of 2 jobs in phases less than 4:1. Configuring the phase limit like that doesn't provide any further restriction.
Your bug report seems to claim that plotman is not enforcing the tmpdir_stagger_phase_limit
. You have it set to a limit of 2 with phase 4:1. Do you have more than 2 jobs in a phase less than 4:1 on a single tmpdir? If so, share however it is that you see that.
You still have the phase limit set the same as the overall limit.
tmpdir_max_jobs
means that any individual tmpdir can have a maximum of 2 jobs in any phase.tmpdir_stagger_phase_limit
says that any individual tmpdir can have a maximum of 2 jobs in phases less than 4:1. Configuring the phase limit like that doesn't provide any further restriction.Your bug report seems to claim that plotman is not enforcing the
tmpdir_stagger_phase_limit
. You have it set to a limit of 2 with phase 4:1. Do you have more than 2 jobs in a phase less than 4:1 on a single tmpdir? If so, share however it is that you see that.
Okay, what I need to DO to have only 4 jobs before they get into 4:1 phase and start 4 more after old one's start transferring to dst? I do have only 4 plotting nvme's
Do you want one job in a phase less than 4:1 on each disk? Also, why do you want to align all of the plots rather than letting them be staggered?
Do you want one job in a phase less than 4:1 on each disk? Also, why do you want to align all of the plots rather than letting them be staggered?
--rmulti2
I can't get CPU usage maxed out outside of phase 1. Though, I also have my 4x tmp drives raid0 on the system with multiple.
- If you want one plot per disk in a phase less than 4:1 then set the phase limit to 1.
- "in parallel" and "aligned" are not the same thing. What is better about 4x plots started every 40 minutes than 1x started every 10 minutes? At this point with madMAx I have both my plotters set with a phase 1 stagger since even with
--rmulti2
I can't get CPU usage maxed out outside of phase 1. Though, I also have my 4x tmp drives raid0 on the system with multiple.
4 plots in parallel with 3200 sec max plot job length, 3200/4=800+/-100, when I'm running 1 job I take about 1100-1200 secs so with 256 buckets and without multiplier, so.....
Again, I am not suggesting you run only a single job. I'm only questioning why you want all four to start at the same time rather than staggered. Why start 4x every 40 minutes rather than 1x every 10 minutes? The stagger just avoids aligning resource usage peaks and valleys in an effort to make smoother continuous 100% usage.
Here's my monitoring dashboard in case a visualization helps. I always have about 3x running but I only start one at a time. The left side is a dual Xeon v0 and the right is an i5 NUC.
Again, I am not suggesting you run only a single job. I'm only questioning why you want all four to start at the same time rather than staggered. Why start 4x every 40 minutes rather than 1x every 10 minutes? The stagger just avoids aligning resource usage peaks and valleys in an effort to make smoother continuous 100% usage.
Here's my monitoring dashboard in case a visualization helps. I always have about 3x running but I only start one at a time. The left side is a dual Xeon v0 and the right is an i5 NUC.
This make no sense, there will be anyway an "natural" staggering when some plots in 4th stage and some new just starting, so in long run there will be only 1-3 actively running when last will be in 4 phase.
I'm not sure what is "natural" about plotman launching plots, but alrighty. Did the question you were asking get a workable answer here?
I'm not sure what is "natural" about plotman launching plots, but alrighty. Did the question you were asking get a workable answer here?
nope.
I need to have only 4 plots in stage <4 and up to 8 in stage >=4 spread between 4 nvmes
There is no globally applied phase limit. There is a per tmpdir phase limit which is what I addressed above.
- If you want one plot per disk in a phase less than 4:1 then set the phase limit to 1.
I think perhaps instead of "up to 8 in stage >= 4" you mean "up to 8 total"? It seems unlikely that you would need to allow twice as many in stage >=4 than in stages <4. Also, it seems there wouldn't be any such limit needed anyways mostly. But, if we go with "up to 8 total", you can achieve that either via global_max_jobs: 8
or tmpdir_max_jobs: 2
depending on your intent. You seem fairly focused on the tmp drives so I'm guessing the latter would be more representative of your intent. Though, all limits must be satisfied so global_max
must still be at least the number of total jobs you want to limit to regardless of phase and tmpdir.
There is no globally applied phase limit. There is a per tmpdir phase limit which is what I addressed above.
- If you want one plot per disk in a phase less than 4:1 then set the phase limit to 1.
I think perhaps instead of "up to 8 in stage >= 4" you mean "up to 8 total"? It seems unlikely that you would need to allow twice as many in stage >=4 than in stages <4. Also, it seems there wouldn't be any such limit needed anyways mostly. But, if we go with "up to 8 total", you can achieve that either via
global_max_jobs: 8
ortmpdir_max_jobs: 2
depending on your intent. You seem fairly focused on the tmp drives so I'm guessing the latter would be more representative of your intent. Though, all limits must be satisfied soglobal_max
must still be at least the number of total jobs you want to limit to regardless of phase and tmpdir.
I mean not more than 4 TOTAL in stage <4 and new job start only while there <4 jobs in stage <4 SO max total 8 but only 4 in stages 1-3
You can limit to one process in a phase less than 4 on each of your individual tmp drives. That is four total processes in a phase less than 4. There is no feature to phase limit globally, independent of any tmp dir. But, I thought you wanted one process in phase < 4 on each tmp drive so it seems like that should be ok for you.
You can limit to one process in a phase less than 4 on each of your individual tmp drives. That is four total processes in a phase less than 4. There is no feature to phase limit globally, independent of any tmp dir. But, I thought you wanted one process in phase < 4 on each tmp drive so it seems like that should be ok for you.
Okay, yes, now what variables do I need to change to get the result? I'm getting overwhelmed with that stuff atm
From the config presently listed in the OP, I set tmpdir_stagger_phase_limit: 1
.
logging:
plots: /root/.chia/plotman/logs
user_interface:
use_stty_size: False
commands:
interactive:
autostart_plotting: False
autostart_archiving: False
directories:
tmp:
- /plotting01
- /plotting02
- /plotting03
- /plotting04
dst:
- /plots
- /plots01
- /plots02
scheduling:
tmpdir_stagger_phase_major: 4
tmpdir_stagger_phase_minor: 1
tmpdir_stagger_phase_limit: 1
tmpdir_max_jobs: 2
global_max_jobs: 8
global_stagger_m: 1
polling_time_s: 20
type: madmax
chia:
k: 32
e: False
n_threads: 2
n_buckets: 128
job_buffer: 3389
madmax:
n_threads: 12
n_buckets: 256
n_buckets3: 256
n_rmulti2: 1
Personally, I would set the global_stagger_m:
to a bit less than a quarter of the time it takes a plot to get to phase 4. This is an iterative process since each change can affect how long the plots take. Basically, approximately evenly stagger the four "really calculating stuff" plots. This helps to smooth out the overall resource usage (CPU, bus usage to RAM and disk, etc) across the plots. In my experience with madMAx it doesn't really want to actually use full cpu in phases other than 1, even if you specify --rmulti2 2
. Certainly this could vary per computer. But, if that's the case for you and you align all four of your parallel plots then you end up with them all battling for cpu in phase 1 and when they all hit phase 2 at about the same time you have cores sitting idle. If you instead always have a plot in phase one and others in phases 2 and 3 then you would always be able to fully utilize your cpu.
Yes, staggering introduces a ramp-up period where you aren't using your full resources. If you are doing 10 plots then this matters, but at that scale tuning plotman like we are going through here doesn't matter. If you are going to leave the system plotting for days and weeks, then an hour of ramp up or such is irrelevant compared to maximizing overall throughput.
Describe the bug Plot jobs start bypassing the stagger major:minor phase limit's
To Reproduce
Steps to reproduce the behavior, e.g.:
Expected behavior Limited jobs before stage N:N
System setup:
Config
full configuration
```yaml logging: plots: /root/.chia/plotman/logs user_interface: use_stty_size: False commands: interactive: autostart_plotting: False autostart_archiving: False directories: tmp: - /plotting01 - /plotting02 - /plotting03 - /plotting04 dst: - /plots - /plots01 - /plots02 scheduling: tmpdir_stagger_phase_major: 4 tmpdir_stagger_phase_minor: 1 tmpdir_stagger_phase_limit: 2 tmpdir_max_jobs: 2 global_max_jobs: 8 global_stagger_m: 1 polling_time_s: 20 type: madmax chia: k: 32 e: False n_threads: 2 n_buckets: 128 job_buffer: 3389 madmax: n_threads: 12 n_buckets: 256 n_buckets3: 256 n_rmulti2: 1 ```