ericaltendorf / plotman

Chia plotting manager
Apache License 2.0
909 stars 280 forks source link

Fixed secondary temp -2 folder is inflexible tbh #910

Open realdevnullius opened 3 years ago

realdevnullius commented 3 years ago

Describe the request

Can we get a variable -2 temp folder, for each plot made it should match the -d destination folder of that plot

Additional comments

I know most people don't like the -2 option, I do for 1 simple reason: it allows me to start new plots a tad sooner because some of the used temp space is being moved to another disk. I have Wolf Pro disks, which are kick-ass fast for an "old" HDD. And if the -2 temp folder is the same as the destination folder for each chia plot run, that would be great. Part of the temp files will already move the destination disk, freeing up space on my plotting disks. I use multiple destination disks, if the -2 can change with it... Thank you :)

Or if anyone can point me to where I should try to change the source code... Let me know too please :)

I'm even willing to donate a small crypto sum if it's complex to make.

altendky commented 3 years ago

So yeah, as you point out, this is an area that hasn't been fully developed.

There are pieces of what you want such as if you do specify tmp2 and don't specify dst then your configured tmp2 will be used for both tmp2 and dst. But, that only works for a single tmp2.

I don't have particular plans to address the shortcomings of tmp2 right now so I'll put out a few questions and comments that may or may not lead to a useful discussion about improving your situation. Or, perhaps they will just help me understand how multiple tmp2 support should work. I also think your setup may be a bit different than the ones I have worked with mostly so hearing about it would help me think it through better.

As chiapos (the Chia Network plotter) has improved its tmp usage has declined. Also, with madmax you end up with fewer parallel processes and thus less overall tmp usage at any given point. This has made the need to get plot processes off of tmp quickly much less important. I would have expected that with HDD (rather than NVMe) plotting that there would be even less of a crunch on tmp space. Though I guess with HDD plotting you might want only one process using a given HDD (or RAID set) to avoid some head thrash. Can you say a bit more about your situation that causes you to be limited by tmp space?

dst drives end up often being used to specify the final resting place for plots, though really archiving is the best tool for handling plot distribution. When using archiving, dst acts as a buffer.

Anyways, could you share your full config and explain what type all the drives are and any other commentary to help me understand your setup? Thanks.

realdevnullius commented 3 years ago

HP Proliant Gen 9; 56 total threads (28 cores; 2x 14), 128GB RAM. If madmax had existed when I configured this setup, I'd have bought something else altogether, true.

Yet here we are, with 2x NVMe (1,9 TB (1.920.383.410.176 bytes) each), same sized 2x SSD and for now a volume of 4x 16TB HDD also for plotting (until the other 8x 16TB HDDs have filled up). I plan on replacing 1 or 2 SSDs with 1 or 2x 4TB NVMe.

I've been trying to tune / tame plotman for weaks now :( As we speak I'm once again testing 3x k33 plots on each NVMe/SSD. I'll know more tomorrow if my current configuration below will work :)

But for now, I have temp limitations that no more than 3 tasks may run before phase 4, step 0. This is squeezing the available free space to the max. Off-setting some of that data to the quick HDDs would give that extra margin; I might even use 3:6 instead of 4:0! Currently, as preparation for my 4TB NVMe's, I'm using 2 temp folders on my 4-disk HDD volume; each will get 3 plots (this will test RAM/CPU bottlenecks). I'm ALSO using that same 4x HDD volume already as a TEMP 2. I don't see any delays for plots on that volume so the HDDs can handle it.

That said, at some point the 4x HDD volume will be converted to 4x HDD for plot storage. Especially then, having the temp -2 folder point to the destination HDD would make for a smooth run, I hope. That said, you'll have to leave enough space for that temp-2, so if there's only space left for 1 or even 2 plots on the destination HDD, that HDD shouldn't be used any longer as a -2 temp folder. Filling the disks to the max would be a manual exercise then, in my case. That would work for me :)

realdevnullius commented 3 years ago
# Default/example plotman.yaml configuration file
# k temp size calculations on https://plot-plan.chia.foxypool.io/

# https://github.com/ericaltendorf/plotman/wiki/Configuration#versions
version: [2]

logging:
        # One directory in which to store all plot job logs (the STDOUT/
        # STDERR of all plot jobs).  In order to monitor progress, plotman
        # reads these logs on a regular basis, so using a fast drive is
        # recommended.
        # sudo mount -t tmpfs -o size=20M tmpfs /mnt/ram/
        #        plots: /home/chia/chia/logs
        plots: /mnt/ram/
        transfers: /home/roet/plotman/log.transfer/
        application: /home/roet/plotman/log.app/plotman.log

# Options for display and rendering
user_interface:
        # Call out to the `stty` program to determine terminal size, instead of
        # relying on what is reported by the curses library.   In some cases,
        # the curses library fails to update on SIGWINCH signals.  If the
        # `plotman interactive` curses interface does not properly adjust when
        # you resize the terminal window, you can try setting this to True.
        use_stty_size: True

# Optional custom settings for the subcommands (status, interactive etc)
commands:
        interactive:
                # Set it to False if you don't want to auto start plotting when 'interactive' is ran.
                # You can override this value from the command line, type "plotman interactive -h" for details
                autostart_plotting: True
                autostart_archiving: True

# Where to plot and log.
directories:
        # One or more directories to use as tmp dirs for plotting.  The
        # scheduler will use all of them and distribute jobs among them.
        # It assumes that IO is independent for each one (i.e., that each
        # one is on a different physical device).
        #
        # If multiple directories share a common prefix, reports will
        # abbreviate and show just the uniquely identifying suffix.
        tmp:
                - /mnt/nvm2
                - /mnt/ssd01
                - /mnt/4x_volume/run-22
                - /mnt/ssd00
                - /home/roet/nvm1
                - /mnt/4x_volume/run-11

        # Optional: tmp2 directory.  If specified, will be passed to
        # chia plots create as -2.  Only one tmp2 directory is supported.
        # tmp2: /mnt/tmp/a
        # /home/roet is on nvme01
        # tmp2: /home/roet/plots.tmp-2/plotman
        tmp2: /mnt/4x_volume/tmp.02

        # Optional: A list of one or more directories; the scheduler will
        # use all of them.  These again are presumed to be on independent
        # physical devices so writes (plot jobs) and reads (archivals) can
        # be scheduled to minimize IO contention.
        #
        # If dst is commented out, the tmp directories will be used as the
        # buffer.
        dst:
                - /mnt/farm/HDD00/Plots_OK/pooled/plotman
                - /mnt/farm/HDD01/Plots_OK/pooled/plotman
                - /mnt/farm/HDD02/Plots_OK/pooled/plotman
                - /mnt/farm/HDD03/Plots_OK/pooled/plotman
                - /mnt/farm/HDD05/Plots_OK/pooled/plotman

# Archival configuration.  Optional; if you do not wish to run the
# archiving operation, comment this section out.  Almost everyone
#
# As of v0.4, archiving commands are highly configurable.  The basic
# configuration consists of a script for checking available disk space
# and another for actually transferring plots.  Each can be specified
# as either a path to an existing script or inline script contents.
# It is expected that most people will use existing recipes and will
# adjust them by specifying environment variables that will set their
# system specific values.  These can be provided to the scripts via
# the `env` key.  plotman will additionally provide `source` and
# `destination` environment variables to the transfer script so it
# knows the specifically selected items to process.  plotman also needs
# to be able to generally detect if a transfer process is already
# running.  To be able to identify externally launched transfers, the
# process name and an argument prefix to match must be provided.  Note
# that variable substitution of environment variables including those
# specified in the env key can be used in both process name and process
# argument prefix elements but that they use the python substitution
# format.
#
# Complete example: https://github.com/ericaltendorf/plotman/wiki/Archiving
#archiving:
#  target: local_rsync
#  env:
#    command: rsync
#    site_root: /mnt/farm

# Plotting scheduling parameters
scheduling:
        # Run a job on a particular temp dir only if the number of existing jobs
        # before [tmpdir_stagger_phase_major : tmpdir_stagger_phase_minor]
        # is less than tmpdir_stagger_phase_limit.
        # Phase major corresponds to the plot phase, phase minor corresponds to
        # the table or table pair in sequence, phase limit corresponds to
        # the number of plots allowed before [phase major : phase minor].
        # e.g, with default settings, a new plot will start only when your plot
        # reaches phase [2 : 1] on your temp drive. This setting takes precidence
        # over global_stagger_m
        # LIMIT WAS 8 TEMPORARILY TO 9 FOR HDD_VOLUME
        tmpdir_stagger_phase_major: 2
        tmpdir_stagger_phase_minor: 1
        # Optional: default is 1
        tmpdir_stagger_phase_limit: 9

        # Don't run more than this many jobs at a time on a single temp dir.
        # WAS 8 BUT TEMPORARY SET TO 16 FOR HDD VOLUME
        tmpdir_max_jobs: 16

        # Don't run more than this many jobs at a time in total.
        # WAS 16 SET TO 32 FOR HDD VOLUME
        global_max_jobs: 32

        # Don't run any jobs (across all temp dirs) more often than this, in minutes.
        # Next runtest try 165 min global stagger ;-(
        # 70 seemed to work well with nvm1 nvm2 ssd0 ssd1, currently using 40 after adding 3x hdd_volume folders
        # for my system in general, with x the amount of temp folders, this is best for m: x*m=280 or m=280/x
        # 35 seemed ok but let's double it to 70, assuming 21hours for a fully stocked queue
        global_stagger_m: 93

        # How often the daemon wakes to consider starting a new plot job, in seconds.
        polling_time_s: 20

        # Optional: Allows the overriding of some scheduling characteristics of the
        # tmp directories specified here.
        # This contains a map of tmp directory names to attributes. If a tmp directory
        # and attribute is not listed here, the default attribute setting from the main
        # configuration will be used
        #
        # Currently support override parameters:
        #     - tmpdir_stagger_phase_major (requires tmpdir_stagger_phase_minor)
        #     - tmpdir_stagger_phase_minor (requires tmpdir_stagger_phase_major)
        #     - tmpdir_stagger_phase_limit
    #     - tmpdir_max_jobs
        tmp_overrides:
                # In this example, /mnt/tmp/00 is larger and faster than the
                # other tmp dirs and it can hold more plots than the default,
                # allowing more simultaneous plots, so they are being started
                # earlier than the global setting above.
                #"/mnt/tmp/00":
                #        tmpdir_stagger_phase_major: 1
                #        tmpdir_stagger_phase_minor: 5
                #        tmpdir_max_jobs: 5
                # Here, /mnt/tmp/03 is smaller, so a different config might be
                # to space the phase stagger further apart and only allow 2 jobs
                # to run concurrently in it
                # QUESTION HOW TO PLAY WITH THESE PHASES?? :(
                #"/mnt/tmp/03":
                #        tmpdir_stagger_phase_major: 3
                #        tmpdir_stagger_phase_minor: 1
                #        tmpdir_max_jobs: 2
                "/home/roet/nvm1":
                        tmpdir_stagger_phase_major: 4
                        tmpdir_stagger_phase_minor: 0
                        tmpdir_stagger_phase_limit: 3
                "/mnt/nvm2":
                        tmpdir_stagger_phase_major: 4
                        tmpdir_stagger_phase_minor: 0
                        tmpdir_stagger_phase_limit: 3
                "/mnt/nvm2":
                        tmpdir_stagger_phase_major: 4
                        tmpdir_stagger_phase_minor: 0
                        tmpdir_stagger_phase_limit: 3
                "/mnt/ssd00":
                        tmpdir_stagger_phase_major: 4
                        tmpdir_stagger_phase_minor: 0
                        tmpdir_stagger_phase_limit: 3
                "/mnt/ssd01":
                        tmpdir_stagger_phase_major: 4
                        tmpdir_stagger_phase_minor: 0
                        tmpdir_stagger_phase_limit: 3
                "/mnt/4x_volume/run-11":
                        tmpdir_stagger_phase_major: 4
                        tmpdir_stagger_phase_minor: 0
                        tmpdir_stagger_phase_limit: 3
                "/mnt/4x_volume/run-22":
                        tmpdir_stagger_phase_major: 4
                        tmpdir_stagger_phase_minor: 0
                        tmpdir_stagger_phase_limit: 3
#                "/mnt/4x_volume/run-33":
#                        tmpdir_stagger_phase_major: 3
#                        tmpdir_stagger_phase_minor: 5
#                        tmpdir_stagger_phase_limit: 3
#                "/mnt/4x_volume/run-44":
#                        tmpdir_stagger_phase_major: 3
#                        tmpdir_stagger_phase_minor: 5
#                        tmpdir_stagger_phase_limit: 3
# Plotting parameters.  These are pass-through parameters to chia plots create.
# See documentation at
# https://github.com/Chia-Network/chia-blockchain/wiki/CLI-Commands-Reference#create
plotting:
        # Your public keys.  Be sure to use the pool contract address for
        # portable pool plots.  The pool public key is only for original
        # non-portable plots that can not be used with the official pooling
        # protocol.
        farmer_pk: a06153cc93227662742954c316c14a61b2cb071c45accbb1706953f6b50555d523760f2cc885dc456e019aa507b8dc63
        # pool_pk: ...
        pool_contract_address: xch1fx6n53h2zlwchylezxn0d6dwp9655gsxdc3ez0h9u4sqpemwqnhq958pru

        # If you enable Chia, plot in *parallel* with higher tmpdir_max_jobs and global_max_jobs
        type: chia
        chia:
                # The stock plotter: https://github.com/Chia-Network/chia-blockchain
                # https://www.incredigeek.com/home/install-plotman-on-ubuntu-harvester/
                # executable: /home/roet/chia-blockchain/venv/bin
                k: 33                # k-size of plot, leave at 32 most of the time
                e: False             # Use -e plotting option
                n_threads: 4         # Threads per job
                n_buckets: 64       # Number of buckets to split data into default 128 smaller is more ram less wear
                job_buffer: 7400     # 3389 k32 #7400 k33 #14800 k34 #29600 k35     # Per job memory
altendky commented 3 years ago

Have you tried putting the pair of matched NVMe in raid0 and used that with madmax and a ~2:0 phase with limit of 1~ phase stagger of 2:0 and limit of 1? comment out tmp2 and dst. plotman will archive from tmp then. After running piles of chiapos processes, when I switched to madmax I ended up with this setup on one system and close to it (added a ramdisk tmp2) on another that had more RAM.

What sort of numbers do you have for TB-of-plots/day in the various setups you have tried?

If you are buying more fast disks, consider more, smaller, matched ones. This generally increases bandwidth and you can raid0 them to use that increased bandwidth with fewer processes and drives to track. Though, it's not obvious that you need more fast disk vs. some RAM perhaps. Or maybe no new hardware.

What make/model are the pair of 2TB NVMe?

realdevnullius commented 3 years ago

Madmax wouldn't have use for that much space? And I'll be using some i9's with 128GB RAM for my k32 plotting needs.

For my current results... I'll have to come back to that in a few days. I just went to a global stagger of 81 mins (down from 93), it will need another 12 hours or so to catch up to this new rhythm. I have to move real slow changing settings, it escalates quickly :)

PS: any Ubuntu solutions to see a history of free disk space? netstats had a 3rd party plugin, but gotta figure that still out :)

realdevnullius commented 3 years ago

2x NVMe: VO001920KWVMT | P09769-001 | P10214-B21 | P10466-001 | HPE 1.92TB NVMe x4 RI SFF SCN DS SSD

altendky commented 3 years ago

I've been using plotman+prometheus+grafana to monitor various resources. But yes, the point of madmax is to get the same or better throughput with fewer processes. This means less temp space and less ramp-up time to respond to configuration changes. I ended up with three processes at a time on each system based on the phase stagger mentioned above.

A quick look suggests that your system should be better than the server I had. Mine managed about 3TB/day with chiapos (21 processes iirc) and up to 5TB/day with madmax (phase1 stagger resulting in 3 processes). It was a dual Xeon 2690 with total 16-cores/32-threads. The madmax numbers were admittedly with a ramdisk for tmp2, but I don't recall that being a really big change, though nice.

realdevnullius commented 3 years ago

Current results are about 15 k33 plots for the last 24hours of finished plots. On average, a plot takes 21 hours (starting at 17 with the very first plots). I have room for more; CPU is not maxed out, nor is RAM or even disk freespace. But again, a setup like this needs very careful tuning :(