HDD load balancing - Githubissues

stavrt commented 3 years ago

I have a question about the hard drive load balancing when using an array of destination drives. If a drive is close to max capacity or full, is plotman smart enough to not use the drives that are too full to accept new plots?

braveckin commented 3 years ago

It does not from my experience... It just startts the job even if it is clear that it will fail at the end... Would be nice to have some feature to redirect to free/nonfull drive...

altendky commented 3 years ago

Mostly that's the archiving feature. It's a bit painful at the moment but we are working on that. dst is not meant to be used as the final resting place for plot files.

MartinGao commented 3 years ago

@altendky I understand that the correct workflow should be tmp -> dst -> archive.

I am using HDD for plotting right now. Due to the bandwidth of HDD, it would be nice to skip the dst and put the final plot files directly to the archive position. Therefore I am using the dst as the archive right now and it works great.

I am also facing the HDD load balancing issue at this moment. It would be nice to have a feature that plotman would check the usage of dst position before init a new plot job. For example, if the dst drive is 90% or 95% full, plotman would skip this dst and use other dst from the array.

altendky commented 3 years ago

Given that you are suggesting a feature be added to the dst layer it seems that it is actually not working great for you.

How about you skip the copy by leaving the plot on the tmp and archiving from there? This can presently only be done in plotman with a single tmp drive but there is https://github.com/ericaltendorf/plotman/pull/234 to make it possible with multiple tmp drives.

MartinGao commented 3 years ago

Thanks for your quick reply.

I check #234 but it is not what I am looking for. #234 suggests using tmp dir as buffer, which is like tmp -> tmp -> archive.

Since I am using HDD for plotting, I would like to avoid unnecessary data transfer. If I could do tmp -> dst or archive, why would I need tmp -> tmp or dst -> archive.

As for SSD, transfer (copy and paste) a 102GB file is not a big issue. But for HDD whose bandwidth is only about 100mb/s, transfer a 102GB would take 20 ~ 30 minutes.

I know that maybe there are only a few people who would use HDD for plotting, so the importance of my issue is relatively low. I am happy with what Plotman can offer right now.

Plotman is a great tool and has already helped me a LOT in plotting. You guys are awesome!

altendky commented 3 years ago

One way or another the data has to come off the tmp and get to somewhere else. Specify tmp as dst does not induce an extra copy. It doesn't write 100gb to tmp and then copy that 100gb to tmp and then let us archive it. Chia specifically checks for tmp and dst being the same (it is actually overly picky, but if you specify the same path for both, that doesn't matter). Less copying is exactly why people like setting dst as tmp.

MartinGao commented 3 years ago

Aha. I get your point. Thanks for clarifying. I'll give it a try.

stavrt commented 3 years ago

While we are on the subject of load balancing HDDs, what about SSDs. I have a 2TB and 1TB SSD but I cannot specify max plots running for specifically each drive, only a global. I would like to do something like 6 and 2 plots for the 2TB and 1TB drives.

altendky commented 3 years ago

You can. I believe this is described in the readme and the config file and perhaps the wiki. It is also unrelated to this issue.

wjx008 commented 3 years ago

While we are on the subject of load balancing HDDs, what about SSDs. I have a 2TB and 1TB SSD but I cannot specify max plots running for specifically each drive, only a global. I would like to do something like 6 and 2 plots for the 2TB and 1TB drives.

It's supported, check this #47

wjx008 commented 3 years ago

One way or another the data has to come off the tmp and get to somewhere else. Specify tmp as dst does not induce an extra copy. It doesn't write 100gb to tmp and then copy that 100gb to tmp and then let us archive it. Chia specifically checks for tmp and dst being the same (it is actually overly picky, but if you specify the same path for both, that doesn't matter). Less copying is exactly why people like setting dst as tmp.

Thanks for this clarification. However, I'm wondering if setting dst as tmp would increase the total bytes write/plot since it will write the 101GiB final plot to tmp drive. Could you confirm this? Also If I have multiple tmp drives, and I set dst same as tmp, would plotman be smart enough to use the same drive for cache and final plot (before archiving it)? Thanks!

altendky commented 3 years ago

The point of the new feature on development is to be smart about multiple tmp drives, yes. Are you concerned that the maybe extra 101GiB will wear your SSD out 6% faster? I'm not sure it really is more write. And if it doesn't go there it does go elsewhere. But you'll have to research on the nuanced details and tradeoffs.

ericaltendorf / plotman

HDD load balancing #331