MrPig91 / PSChiaPlotter

A repo for powershell module that helps Chia Plotting
MIT License
182 stars 47 forks source link

Problem with "Waiting on Final Dir Space" checks #127

Open Jacek-ghub opened 3 years ago

Jacek-ghub commented 3 years ago

Hi,

I have three 1TB NVMes used as tmp folders. Today, I added a 1TB NVMe as the final folder to decouple plotting jobs from the final plot xfr to HDs. Thanks to that, the copy process takes about 1:10. I have a batch file that loops every minute, and if it finds a new plot there, it moves it to the final HD (so plotting is not affected by that slow transfer or eventual collisions, and I can control the final HD through that batch file, so can run just one long job and target multiple drives).

With that H/W setup, I started one job with 9 queues, and 5 plots per queue. It looks like there is a problem with the final copy phase, or starting a new plot with that setup.

I saw four consecutive jobs (first four, as it didn't get yet to finish the 5th plot) being stuck on "Waiting on Final Dir Space" event (New-ChiaQueueRunspace function, #grab a volume that has enough space). The plot is finished, removed from "Current Runs," moved to "Completed Runs" but the new plot is not started. In the "Job Queues" the status of that queue that just finished is set to "Waiting on Final Dir Space." After the batch file removes that new plot from that final NVMe, PSCP wakes up and starts a new plot.

Just to make sure, that this is the case, for the fifth plot, I killed that batch file, and PSCP got stuck, until I restarted it, and it removed that plot.

I don't quite understand why PSCP was waiting for that plot to be removed, as the folder sizes at that time were: ~350GB used on 1TB Tmp folder, and 100GB used (one plot) on 1TB Dest Folder - plenty of room on both to start a new plot.

Is PSCP confused with the small size of that final NVMe compared to what is needed to finish the whole job (only 1TB, where the job has 40 plots)?

Thank you, Jacek

MrPig91 commented 3 years ago

Are you using 2 1TB NVMEs's? One for temp and one final destination. The way PSCP calculates if there is enough free space is by grabbing the current freespace and subtracting all current runs whose final destination are set to the drive by their final plot size. How much freespace did your final destination drive have and how many runs targeted that drive when it got that status message?

($volume.FreeSpace - ($Volume.PendingFinalRuns.Count * $finalplotsize)) -gt $finalplotsize

I really like what you have setup by the way. 1 minutes transfer time is quite amazing.

Jacek-ghub commented 3 years ago

Hi Syrius,

To clarify: 1 job 9 queues 5 plots per queue 3 1TB NVMe tmp folders d:\tmp, e:\tmp, f:\tmp (separate NVMe sticks) 1 1TB NVMe dst folder g:\dst (separate stick)

So, during that "Waiting ..." time, the that calculation was (0.9TB - (9plots * 0.1TB) -gt 0.1TB) => false. Yes, I understand why it got stuck, as PSCP is not aware that the final dst is also being actively managed. I will change it locally to return true when there is just one slot free there." Maybe, the same way we have that "Ignore Max Parallel" checkmark, we should have for now "Ignore Final Dst Limit" checkmark to enable runs like on my setup?

The load on that dst folder is just 1/16 of the tmp folder, so it will last for quite a bit longer than tmp ones, but if it works, it is a good improvement.

Actually, my previous setup was to share Tmp and Dst folders on the same NVMes (i.e., # of tmp folders == # of dst folders), but at that time, I think that the next plot was stuck on "Waiting on Tmp Folder" (basically the same free space logic, I think). This setup could/should be implemented by PSCP internally, as it would make PSChiaPlotter a tad better, basically collision free and one or so extra plot per day.

Thank you, Jacek

Jacek-ghub commented 3 years ago

Hi Syrius,

I tried to fight that PowerShell ignoring psm1 file modifications, but didn't go anywhere. I tried to google it, but didn't find much as well. I guess, not that many people try to make it work in place, and apparently Windows is doing everything possible to fight it. Although, I would still appreciate any pointers to how can I test any local changes (e.g., zip all those files, move to a different folder, rename few things (???), and try to run it directly from that folder).

If you could modify Get-BestChiaFinalDrive and Get-BestChiaTempDrive to conditionally bypass part of those checks, I would really appreciate it.

Here are proposed changes:

// function Get-BestChiaFinalDrive
// instead of
            if (($volume.FreeSpace - ($Volume.PendingFinalRuns.Count * $finalplotsize)) -gt $finalplotsize){
                    return $volume
            }
//use
  if ($ChiaJob.IgnoreMaxParallel)   // preferably some other test that can be triggered via some init file modification, so it will not affect those that checked IgnoreMaxParallel
  {
    if ($volume.FreeSpace -gt $finalplotsize)
      return $volume
  }
  else
  {
    if (($volume.FreeSpace - ($Volume.PendingFinalRuns.Count * $finalplotsize)) -gt $finalplotsize)
      return $volume
  }

// function Get-BestChiaTempDrive
// same for Temp space, although, should that Temp test line be:
//   if (($volume.FreeSpace - ($Volume.PendingFinalRuns.Count * $requiredTempSize)) -gt $requiredTempSize){
// instead of
//   if (($volume.FreeSpace - ($Volume.PendingFinalRuns.Count * $finalplotsize))    -gt $requiredTempSize){

Thank you, Jacek

imClement commented 3 years ago

I confirm same behavior when the Dst drive space is much less then the number of plots in the job. In my tests, the "Waiting on Final Dir Space" status disappear once the new plot os moved to the HDD by the RoboCopy command.

imClement commented 3 years ago

Also, I confirm the problem in another scenario:

Job1:

Tmp X: - 1 TB nvme Tmp2 Q: - 1 TB nvme (same for Job2) Dst E: 5 TB hdd (same for Job2)

Paralel Count: 4 Delay: 60

Job2:

Tmp Y: - 1 TB nvme Tmp2 Q: - 1 TB nvme (same for Job1) Dst E: 5 TB hdd (same for Job1)

Paralel Count: 4 Delay: 60 First Delay: 45

=======================

One plot from each job will remain with status "Waiting on final disc space" even they are in the "Completed Runs" list. And the next queue will not start.