ericaltendorf / plotman

Chia plotting manager
Apache License 2.0
909 stars 280 forks source link

Plotman (k33) keeps making plots for a -d storage disk that's already full - it even reports it as full! #928

Closed realdevnullius closed 3 years ago

realdevnullius commented 3 years ago

Per title...

(venv) roet@Lovecraft:~/chia-blockchain$ plotman dsched
  /mnt/farm/HDD02/Plots_OK/pooled/plotman : 5:0
  /mnt/farm/HDD06/Plots_OK/pooled/plotman : 4:1
  /mnt/farm/HDD00/Plots_OK/pooled/plotman : 3:2
  /mnt/farm/HDD01/Plots_OK/pooled/plotman : 3:4
  /mnt/farm/HDD03/Plots_OK/pooled/plotman : 3:3
  /mnt/farm/HDD05/Plots_OK/pooled/plotman : 3:2
  /mnt/farm/HDD07/Plots_OK/pooled/plotman : 2:5
(venv) roet@Lovecraft:~/chia-blockchain$ plotman dirs
         tmp             ready         phases       
  /home/roet/tmp/ssd00      OK   1:5 2:5            
  /home/roet/tmp/ssd01      OK   1:3 2:2 3:3 4:1 5:0
 /mnt/4x_volume/run-22      OK   1:2 1:6 3:2 3:6    
/mnt/nvme_vol/nvme_0-0      --   1:3 1:5 2:6 3:4    
/mnt/nvme_vol/nvme_0-1      OK   1:4 2:4 3:2 5:0    
           /mnt/ssd03/      OK   1:1 2:4 5:0        
                  dst                     plots   GBfree         inbnd phases         pri
/mnt/farm/HDD00/Plots_OK/pooled/plotman   45      5898     1:5 2:4 3:2                95 
/mnt/farm/HDD01/Plots_OK/pooled/plotman   66      1200     1:3 2:4 3:4                112
/mnt/farm/HDD02/Plots_OK/pooled/plotman   71      0        1:5 2:6 [+1] 5:0 5:0 5:0   106
/mnt/farm/HDD03/Plots_OK/pooled/plotman   46      5489     1:3 2:2 3:3                96 
/mnt/farm/HDD05/Plots_OK/pooled/plotman   65      1424     1:2 1:6 3:2                115
/mnt/farm/HDD06/Plots_OK/pooled/plotman   35      8152     1:1 4:1                    53 
/mnt/farm/HDD07/Plots_OK/pooled/plotman   36      7927     1:4 2:5                    86 

HDD02 is full as can be, even when I started plotman interactive: the disk already was full! Yet it started plotting lots of new plots to it...

(venv) roet@Lovecraft:~/chia-blockchain$ cat ~/.config/plotman/plotman.yaml
# Default/example plotman.yaml configuration file
# k temp size calculations on https://plot-plan.chia.foxypool.io/

# https://github.com/ericaltendorf/plotman/wiki/Configuration#versions
version: [2]

logging:
        # One directory in which to store all plot job logs (the STDOUT/
        # STDERR of all plot jobs).  In order to monitor progress, plotman
        # reads these logs on a regular basis, so using a fast drive is
        # recommended.
        # sudo mount -t tmpfs -o size=20M tmpfs /mnt/ram/
        #        plots: /home/chia/chia/logs
        plots: /mnt/ram/
        transfers: /home/roet/plotman/log.transfer/
        application: /home/roet/plotman/log.app/plotman.log
        disk_spaces: /home/roet/plotman/log.diskspaces/diskspaces.log

# Options for display and rendering
user_interface:
        # Call out to the `stty` program to determine terminal size, instead of
        # relying on what is reported by the curses library.   In some cases,
        # the curses library fails to update on SIGWINCH signals.  If the
        # `plotman interactive` curses interface does not properly adjust when
        # you resize the terminal window, you can try setting this to True. 
        use_stty_size: True

# Optional custom settings for the subcommands (status, interactive etc)
commands:
        interactive:
                # Set it to False if you don't want to auto start plotting when 'interactive' is ran.
                # You can override this value from the command line, type "plotman interactive -h" for details
                autostart_plotting: True
                autostart_archiving: True

# Where to plot and log.
directories:
        # One or more directories to use as tmp dirs for plotting.  The
        # scheduler will use all of them and distribute jobs among them.
        # It assumes that IO is independent for each one (i.e., that each
        # one is on a different physical device).
        #
        # If multiple directories share a common prefix, reports will
        # abbreviate and show just the uniquely identifying suffix.
        tmp:
                - /mnt/nvme_vol/nvme_0-0
                - /home/roet/tmp/ssd00
                - /mnt/ssd03/
                - /mnt/nvme_vol/nvme_0-1
                - /home/roet/tmp/ssd01
                - /mnt/4x_volume/run-22

        # Optional: tmp2 directory.  If specified, will be passed to
        # chia plots create as -2.  Only one tmp2 directory is supported.
        # tmp2: /mnt/tmp/a
        # /home/roet is on nvme01
        # tmp2: /home/roet/plots.tmp-2/plotman
        #tmp2: /mnt/4x_volume/tmp.02 with global stagger 66 it goes well for a long time
        #then it starts forgetting about plots, slowly filling the disks in a few days time

        # Optional: A list of one or more directories; the scheduler will
        # use all of them.  These again are presumed to be on independent
        # physical devices so writes (plot jobs) and reads (archivals) can
        # be scheduled to minimize IO contention.
        # 
        # If dst is commented out, the tmp directories will be used as the
        # buffer.
        # disk full        - /mnt/farm/HDD04/Plots_OK/pooled/plotman
        dst:
                - /mnt/farm/HDD00/Plots_OK/pooled/plotman
                - /mnt/farm/HDD01/Plots_OK/pooled/plotman
                - /mnt/farm/HDD02/Plots_OK/pooled/plotman
                - /mnt/farm/HDD03/Plots_OK/pooled/plotman
                - /mnt/farm/HDD05/Plots_OK/pooled/plotman
                - /mnt/farm/HDD06/Plots_OK/pooled/plotman
                - /mnt/farm/HDD07/Plots_OK/pooled/plotman

# Archival configuration.  Optional; if you do not wish to run the
# archiving operation, comment this section out.  Almost everyone
# should be using the archival feature.  It is meant to distribute
# plots among multiple disks filling them all.  This can be done both
# to local and to remote disks.
#
# As of v0.4, archiving commands are highly configurable.  The basic
# configuration consists of a script for checking available disk space
# and another for actually transferring plots.  Each can be specified
# as either a path to an existing script or inline script contents.
# It is expected that most people will use existing recipes and will
# adjust them by specifying environment variables that will set their
# system specific values.  These can be provided to the scripts via
# the `env` key.  plotman will additionally provide `source` and
# `destination` environment variables to the transfer script so it
# knows the specifically selected items to process.  plotman also needs
# to be able to generally detect if a transfer process is already
# running.  To be able to identify externally launched transfers, the
# process name and an argument prefix to match must be provided.  Note
# that variable substitution of environment variables including those
# specified in the env key can be used in both process name and process
# argument prefix elements but that they use the python substitution
# format.
#
# Complete example: https://github.com/ericaltendorf/plotman/wiki/Archiving
#archiving:
#  target: local_rsync
#  env:
#    command: rsync
#    site_root: /mnt/farm

# Plotting scheduling parameters
scheduling:
        # Run a job on a particular temp dir only if the number of existing jobs
        # before [tmpdir_stagger_phase_major : tmpdir_stagger_phase_minor]
        # is less than tmpdir_stagger_phase_limit.
        # Phase major corresponds to the plot phase, phase minor corresponds to
        # the table or table pair in sequence, phase limit corresponds to
        # the number of plots allowed before [phase major : phase minor].
        # e.g, with default settings, a new plot will start only when your plot
        # reaches phase [2 : 1] on your temp drive. This setting takes precidence
        # over global_stagger_m
        # LIMIT WAS 8 TEMPORARILY TO 9 FOR HDD_VOLUME
        tmpdir_stagger_phase_major: 2
        tmpdir_stagger_phase_minor: 1
        # Optional: default is 1
        tmpdir_stagger_phase_limit: 9

        # Don't run more than this many jobs at a time on a single temp dir.
        # WAS 8 BUT TEMPORARY SET TO 16 FOR HDD VOLUME
        tmpdir_max_jobs: 16

        # Don't run more than this many jobs at a time in total.
        # WAS 16 SET TO 32 FOR HDD VOLUME
        global_max_jobs: 32

        # Don't run any jobs (across all temp dirs) more often than this, in minutes.
        # Next runtest try 165 min global stagger ;-( 
        # 70 seemed to work well with nvm1 nvm2 ssd0 ssd1, currently using 40 after adding 3x hdd_volume folders
        # for my system in general, with x the amount of temp folders, this is best for m: x*m=280 or m=280/x
        # 35 seemed ok but let's double it to 70, assuming 21hours for a fully stocked queue
        # 93 mins gave equilibrium with plots made and finished with 4:0 set to 3 I had 2 plots building at all times
        # 81 mins same... no catching up as of yet
        # 78 still not really catching up... back to 70? sigh
        # 75 still not enough pressure to get up to 4 plots at all times per temp, so back to 70
        # let's play a game... 65 is catching up with the new 4 max. up to 67 before it gets too tight - or 66, sigh 65 back to 68 after left over temp files bug
        # m68 after a few days remains stable at max 20 parallel plots - there's always a TMP READY and available. Disks may be on max, lots of 100% busy's. Testing 66 next
        # m66 causes bug with ID-less, forgotten plots that get left behind. So m67 next (after a few hours of no new plots) - BULLSHIT; RAM LOG DISK FULL IS WHAT
        # correction concerning m66 it works fine, but a bit more pressure might be ok. Mosst of the disks are at 100% use already but let's see... m64!
        # hmm m64 and I notice a forgotten plot again filling up ssd03 since yesterday 24h to 27h ago. Back to 65
        global_stagger_m: 65

        # How often the daemon wakes to consider starting a new plot job, in seconds.
        polling_time_s: 20

        # Optional: Allows the overriding of some scheduling characteristics of the
        # tmp directories specified here.
        # This contains a map of tmp directory names to attributes. If a tmp directory 
        # and attribute is not listed here, the default attribute setting from the main 
        # configuration will be used
        #
        # Currently support override parameters:
        #     - tmpdir_stagger_phase_major (requires tmpdir_stagger_phase_minor)
        #     - tmpdir_stagger_phase_minor (requires tmpdir_stagger_phase_major)
        #     - tmpdir_stagger_phase_limit
        #     - tmpdir_max_jobs
        tmp_overrides:
                # In this example, /mnt/tmp/00 is larger and faster than the
                # other tmp dirs and it can hold more plots than the default,
                # allowing more simultaneous plots, so they are being started
                # earlier than the global setting above.
                #"/mnt/tmp/00":
                #        tmpdir_stagger_phase_major: 1
                #        tmpdir_stagger_phase_minor: 5
                #        tmpdir_max_jobs: 5
                # Here, /mnt/tmp/03 is smaller, so a different config might be
                # to space the phase stagger further apart and only allow 2 jobs
                # to run concurrently in it
                # QUESTION HOW TO PLAY WITH THESE PHASES?? :(
                #"/mnt/tmp/03":
                #        tmpdir_stagger_phase_major: 3
                #        tmpdir_stagger_phase_minor: 1
                #        tmpdir_max_jobs: 2
                # - /mnt/nvme_vol/nvme_0-0
                # - /home/roet/tmp/ssd00
                # - /mnt/4x_volume/run-11
                # - /mnt/nvme_vol/nvme_0-0
                # - /home/roet/tmp/ssd01
                # - /mnt/4x_volume/run-22
                "/mnt/nvme_vol/nvme_0-0":
                        tmpdir_stagger_phase_major: 3
                        tmpdir_stagger_phase_minor: 6
                        tmpdir_stagger_phase_limit: 4
                "/home/roet/tmp/ssd00":
                        tmpdir_stagger_phase_major: 3
                        tmpdir_stagger_phase_minor: 6
                        tmpdir_stagger_phase_limit: 4
                "/mnt/nvme_vol/nvme_0-1":
                        tmpdir_stagger_phase_major: 3
                        tmpdir_stagger_phase_minor: 6
                        tmpdir_stagger_phase_limit: 4
                "/home/roet/tmp/ssd01":
                        tmpdir_stagger_phase_major: 3
                        tmpdir_stagger_phase_minor: 6
                        tmpdir_stagger_phase_limit: 4
                "/mnt/ssd03":
                        tmpdir_stagger_phase_major: 3
                        tmpdir_stagger_phase_minor: 6
                        tmpdir_stagger_phase_limit: 4
                "/mnt/4x_volume/run-22":
                        tmpdir_stagger_phase_major: 3
                        tmpdir_stagger_phase_minor: 6
                        tmpdir_stagger_phase_limit: 4
#                "/mnt/4x_volume/run-33":
#                        tmpdir_stagger_phase_major: 3
#                        tmpdir_stagger_phase_minor: 5
#                        tmpdir_stagger_phase_limit: 3
#                "/mnt/4x_volume/run-44":
#                        tmpdir_stagger_phase_major: 3
#                        tmpdir_stagger_phase_minor: 5
#                        tmpdir_stagger_phase_limit: 3

# Plotting parameters.  These are pass-through parameters to chia plots create.
# See documentation at
# https://github.com/Chia-Network/chia-blockchain/wiki/CLI-Commands-Reference#create
plotting:
        # Your public keys.  Be sure to use the pool contract address for
        # portable pool plots.  The pool public key is only for original
        # non-portable plots that can not be used with the official pooling
        # protocol.
        farmer_pk: a06153760fb8dc63
        # pool_pk: ...
        pool_contract_address: xch1fx6n53h2zlwchylzxn0d6dwp90h9u4sqpemwqnhq958pru

        # If you enable Chia, plot in *parallel* with higher tmpdir_max_jobs and global_max_jobs
        type: chia
        chia:
                # The stock plotter: https://github.com/Chia-Network/chia-blockchain
                # https://www.incredigeek.com/home/install-plotman-on-ubuntu-harvester/
                # executable: /home/roet/chia-blockchain/venv/bin
                k: 33                # k-size of plot, leave at 32 most of the time
                e: False             # Use -e plotting option
                n_threads: 4         # Threads per job
                n_buckets: 64       # Number of buckets to split data into default 128 smaller is more ram less wear
                job_buffer: 7400     # 3389 k32 #7400 k33 #14800 k34 #29600 k35     # Per job memory

        # If you enable madMAx, plot in *sequence* with very low tmpdir_max_jobs and global_max_jobs
        madmax:
                # madMAx plotter: https://github.com/madMAx43v3r/chia-plotter
                # executable: /path/to/chia_plot
                n_threads: 4          # Default is 4, crank up if you have many cores
                n_buckets: 256        # Default is 256
altendky commented 3 years ago

dst is not meant to be used as the final resting place for plots unless you are going to manage these situations yourself. It is meant as a buffer. Archiving is used to distribute plots to fill disks.

https://github.com/ericaltendorf/plotman/wiki/Archiving