Open stavrt opened 3 years ago
It does not from my experience... It just startts the job even if it is clear that it will fail at the end... Would be nice to have some feature to redirect to free/nonfull drive...
Mostly that's the archiving feature. It's a bit painful at the moment but we are working on that. dst
is not meant to be used as the final resting place for plot files.
@altendky I understand that the correct workflow should be tmp
-> dst
-> archive
.
I am using HDD for plotting right now. Due to the bandwidth of HDD, it would be nice to skip the dst
and put the final plot files directly to the archive
position. Therefore I am using the dst
as the archive
right now and it works great.
I am also facing the HDD load balancing issue at this moment. It would be nice to have a feature that plotman would check the usage of dst
position before init a new plot job. For example, if the dst
drive is 90% or 95% full, plotman would skip this dst
and use other dst
from the array.
Given that you are suggesting a feature be added to the dst layer it seems that it is actually not working great for you.
How about you skip the copy by leaving the plot on the tmp and archiving from there? This can presently only be done in plotman with a single tmp drive but there is https://github.com/ericaltendorf/plotman/pull/234 to make it possible with multiple tmp drives.
Thanks for your quick reply.
I check #234 but it is not what I am looking for. #234 suggests using tmp dir as buffer, which is like tmp
-> tmp
-> archive
.
Since I am using HDD for plotting, I would like to avoid unnecessary data transfer.
If I could do tmp
-> dst
or archive
, why would I need tmp
-> tmp
or dst
-> archive
.
As for SSD, transfer (copy and paste) a 102GB file is not a big issue. But for HDD whose bandwidth is only about 100mb/s, transfer a 102GB would take 20 ~ 30 minutes.
I know that maybe there are only a few people who would use HDD for plotting, so the importance of my issue is relatively low. I am happy with what Plotman can offer right now.
Plotman is a great tool and has already helped me a LOT in plotting. You guys are awesome!
One way or another the data has to come off the tmp and get to somewhere else. Specify tmp as dst does not induce an extra copy. It doesn't write 100gb to tmp and then copy that 100gb to tmp and then let us archive it. Chia specifically checks for tmp and dst being the same (it is actually overly picky, but if you specify the same path for both, that doesn't matter). Less copying is exactly why people like setting dst as tmp.
Aha. I get your point. Thanks for clarifying. I'll give it a try.
While we are on the subject of load balancing HDDs, what about SSDs. I have a 2TB and 1TB SSD but I cannot specify max plots running for specifically each drive, only a global. I would like to do something like 6 and 2 plots for the 2TB and 1TB drives.
You can. I believe this is described in the readme and the config file and perhaps the wiki. It is also unrelated to this issue.
While we are on the subject of load balancing HDDs, what about SSDs. I have a 2TB and 1TB SSD but I cannot specify max plots running for specifically each drive, only a global. I would like to do something like 6 and 2 plots for the 2TB and 1TB drives.
It's supported, check this #47
One way or another the data has to come off the tmp and get to somewhere else. Specify tmp as dst does not induce an extra copy. It doesn't write 100gb to tmp and then copy that 100gb to tmp and then let us archive it. Chia specifically checks for tmp and dst being the same (it is actually overly picky, but if you specify the same path for both, that doesn't matter). Less copying is exactly why people like setting dst as tmp.
Thanks for this clarification. However, I'm wondering if setting dst as tmp would increase the total bytes write/plot since it will write the 101GiB final plot to tmp drive. Could you confirm this? Also If I have multiple tmp drives, and I set dst same as tmp, would plotman be smart enough to use the same drive for cache and final plot (before archiving it)? Thanks!
The point of the new feature on development is to be smart about multiple tmp drives, yes. Are you concerned that the maybe extra 101GiB will wear your SSD out 6% faster? I'm not sure it really is more write. And if it doesn't go there it does go elsewhere. But you'll have to research on the nuanced details and tradeoffs.
I have a question about the hard drive load balancing when using an array of destination drives. If a drive is close to max capacity or full, is plotman smart enough to not use the drives that are too full to accept new plots?