Chia-Network / bladebit

A high-performance k32-only, Chia (XCH) plotter supporting in-RAM and disk-based plotting
Apache License 2.0
340 stars 107 forks source link

Windows ramplot not valid #434

Open tuinboontje opened 8 months ago

tuinboontje commented 8 months ago

I have an issue with ram plot. None of the c1 through c4 are valid. c5 it just craps out and hangs will not even plot. the generated log for the plots do not mention any error of any kind thus presumend valid/succeded. Can't seem to find any info on this problem. Dual Xeons with 512 GB of Ram so ram shortage can be ruled out. Using the bladebit command line it just says sytem mem and closes. Any help/pointers are much appreciated. After inspection it just succesfully generated c1 plots\the other

Bladebit Chia Plotter Version : 3.1.0 Git Commit : e9836f8bd963321457bc86eb5d61344bfb76dcf0 Compiled With: msvc 19.29.30152

[Global Plotting Config] Will create 1 plots. Thread count : 32 Warm start enabled : false NUMA disabled : false CPU affinity disabled : false Farmer public key : ** Pool contract address : ** Compression Level : 1 Benchmark mode : disabled

System Memory: 501/511 GiB. Memory required: 416 GiB. Allocating buffers.

Generating plot 1 / 1: 26068862a560d7ccd78a902d3166d79a8c992c309de2093b3b7f4aee54123327 Plot temporary file: G:\plot-k32-c01-2023-10-16-06-17-26068862a560d7ccd78a902d3166d79a8c992c309de2093b3b7f4aee54123327.plot.tmp

Running Phase 1 Generating F1... Finished F1 generation in 34.92 seconds. Sorting F1... Finished F1 sort in 162.35 seconds. Progress update: 0.01 Forward propagating to table 2... Pairing L/R groups... Finished pairing L/R groups in 36.5480 seconds. Created 4294967296 pairs. Average of 236.1406 pairs per group. Computing Fx... Finished computing Fx in 36.3710 seconds. Sorting entries... Finished sorting in 306.08 seconds. Finished forward propagating table 2 in 379.74 seconds. Progress update: 0.06 Forward propagating to table 3... Pairing L/R groups... Finished pairing L/R groups in 26.5890 seconds. Created 4294955440 pairs. Average of 236.1400 pairs per group. Computing Fx... Finished computing Fx in 36.5790 seconds. Sorting entries... Finished sorting in 244.27 seconds. Finished forward propagating table 3 in 308.82 seconds. Progress update: 0.12 Forward propagating to table 4... Pairing L/R groups... Finished pairing L/R groups in 25.7270 seconds. Created 4294967296 pairs. Average of 236.1406 pairs per group. Computing Fx... Finished computing Fx in 29.9160 seconds. Sorting entries... Finished sorting in 233.35 seconds. Finished forward propagating table 4 in 289.77 seconds. Progress update: 0.2 Forward propagating to table 5... Pairing L/R groups... Finished pairing L/R groups in 25.9710 seconds. Created 4294967296 pairs. Average of 236.1406 pairs per group. Computing Fx... Finished computing Fx in 38.4280 seconds. Sorting entries... Finished sorting in 233.17 seconds. Finished forward propagating table 5 in 298.95 seconds. Progress update: 0.28 Forward propagating to table 6... Pairing L/R groups... Finished pairing L/R groups in 26.3800 seconds. Created 4294967296 pairs. Average of 236.1406 pairs per group. Computing Fx... Finished computing Fx in 36.7030 seconds. Sorting entries... Finished sorting in 239.34 seconds. Finished forward propagating table 6 in 303.30 seconds. Progress update: 0.36 Forward propagating to table 7... Pairing L/R groups... Finished pairing L/R groups in 26.5250 seconds. Created 4294967296 pairs. Average of 236.1406 pairs per group. Computing Fx... Finished computing Fx in 34.5100 seconds. Finished forward propagating table 7 in 62.39 seconds. Progress update: 0.42 Finished Phase 1 in 1840.25 seconds. Running Phase 2 Prunning table 6... Finished prunning table 6 in 0.95 seconds. Progress update: 0.43 Prunning table 5... Finished prunning table 5 in 41.85 seconds. Progress update: 0.48 Prunning table 4... Finished prunning table 4 in 40.06 seconds. Progress update: 0.51 Prunning table 3... Finished prunning table 3 in 43.31 seconds. Progress update: 0.55 Finished Phase 2 in 126.63 seconds. Running Phase 3 Compressing tables 2 and 3... Finished compressing tables 2 and 3 in 122.39 seconds Progress update: 0.73 Table 2 now has 3439886779 / 4294955440 entries ( 80.09% ). Compressing tables 3 and 4... Finished compressing tables 3 and 4 in 122.18 seconds Progress update: 0.79 Table 3 now has 3466099063 / 4294967296 entries ( 80.70% ). Compressing tables 4 and 5... Finished compressing tables 4 and 5 in 123.88 seconds Progress update: 0.85 Table 4 now has 3533016750 / 4294967296 entries ( 82.26% ). Compressing tables 5 and 6... Finished compressing tables 5 and 6 in 130.37 seconds Progress update: 0.92 Table 5 now has 3713709595 / 4294967296 entries ( 86.47% ). Compressing tables 6 and 7... Finished compressing tables 6 and 7 in 156.02 seconds Progress update: 0.98 Table 6 now has 4294967296 / 4294967296 entries ( 100.00% ). Finished Phase 3 in 654.84 seconds. Running Phase 4 Writing P7. Finished writing P7 in 1.48 seconds. Writing C1 table. Finished writing C1 table in 0.01 seconds. Writing C2 table. Finished writing C2 table in 0.00 seconds. Writing C3 table. Finished writing C3 table in 1.68 seconds. Finished Phase 4 in 3.16 seconds. Writing final plot tables to disk G:\plot-k32-c01-2023-10-16-06-17-26068862a560d7ccd78a902d3166d79a8c992c309de2093b3b7f4aee54123327.plot.tmp -> G:\plot-k32-c01-2023-10-16-06-17-26068862a560d7ccd78a902d3166d79a8c992c309de2093b3b7f4aee54123327.plot Final plot table pointers: Table 1: 4096 ( 0x0000000000001000 ) Table 2: 4096 ( 0x0000000000001000 ) Table 3: 14001424784 ( 0x00000003428cc990 ) Table 4: 28090921184 ( 0x000000068a5970e0 ) Table 5: 42452428634 ( 0x00000009e25ca75a ) Table 6: 57548442509 ( 0x0000000d66278b8d ) Table 7: 75007232909 ( 0x0000001176c78b8d ) C 1 : 92723973005 ( 0x0000001596c78b8d ) C 2 : 92725690997 ( 0x0000001596e1c275 ) C 3 : 92725691173 ( 0x0000001596e1c325 )

Final plot table sizes: Table 1: 0.00 MiB Table 2: 13352.80 MiB Table 3: 13436.79 MiB Table 4: 13696.20 MiB Table 5: 14396.68 MiB Table 6: 16650.00 MiB Table 7: 16896.00 MiB C 1 : 1.64 MiB C 2 : 0.00 MiB C 3 : 1228.80 MiB

Finished writing tables to disk in 20.77 seconds. Finished plotting in 2645.66 seconds (44.09 minutes). C5:

Bladebit Chia Plotter Version : 3.1.0 Git Commit : e9836f8bd963321457bc86eb5d61344bfb76dcf0 Compiled With: msvc 19.29.30152

[Global Plotting Config] Will create 1 plots. Thread count : 32 Warm start enabled : false NUMA disabled : false CPU affinity disabled : false Farmer public key : ** Pool contract address : ** Compression Level : 5 Benchmark mode : disabled

System Memory: 501/511 GiB. Memory required: 416 GiB. Allocating buff

### Tasks
tuinboontje commented 8 months ago

WHY does it still make a temp file on the drive when plotting in ram?

LeroyINC commented 8 months ago

it writes to the final plot file during the plotting process. so the final write time takes less. as it can commit some of it as it goes.

tuinboontje commented 8 months ago

Any ideas regarding the crashing of higher then c1 plots, and being invalid even though the log says it is completed?

tuinboontje commented 8 months ago

No one I've got one. It's just poorly poorly optimized read somewhere that bladebit does not like 2 numa nodes. Did 1 it worked at first, but still crashed after 2 invalid plots. All ramplots are invalid. C0 through C5. Even diskplot: cheap nvme read ±400 write 750MB 121 min. Better nvme ± 700 read ±1.1 to 1.4GB a second 62 min so that scales fairly decent. On to a datacenter stripe array that reads 5.5 GB a second and writes around 7GB (they are sustained write optimized) it takes 76 minutes. There is absolutely no rime nor reason for these times. Get the gnarly feeling that it is mostly AMD optimized. Could be wrong but other diskplotters with similar read n write that are AMD based hit sub to low 20 min. Some even faster.... The thought creeps up on me that in their haste to catch up in the cuda compressing arms race they are focusing on the cuda plotters optimisation, and leading the disk and ramplotters out to pasture. Or bluntly stated left standing with their pants around their ankles. Then again that's my 2 cents worth of experience so far. Looks like I and others like me are forced to cuda plotting...... Don't think that this is on par with the original Chia mindset

harold-b commented 8 months ago

No one I've got one. It's just poorly poorly optimized read somewhere that bladebit does not like 2 numa nodes.

This is incorrect. bladebit's ramplot I believe is the only plotter explicitly coded w/ NUMA in mind. You might have to choose the best settings for high memory throughput on your BIOS. ramplot has not changed since its first versions w/ users running them for months at a time with no crashes, so its the most stable out of the 3 variants.

The only things that changed in the latest v3 were minor things to accommodate for compressed plotting. There could potentially be a bug there in phase 3, but we never encountered any during testing. If you can provide plot id's and compression levels for the plots you created that were invalid I could reproduce them locally to see if I encounter them as well. If you have some full logs of ramplot's faulty plots please post a few as well and I can take a look

harold-b commented 8 months ago

Are the corrupt ones only in Windows, by the way?

tuinboontje commented 8 months ago

Yes sir they are windows based. Where could I find these, do you mean the actual id's of the plots. If so I will post them. As for logs they just crap out at the start at first and after reading about the numa dislike i changed it to no more than 16 threads, thus using 1 numa node and that worked at first.... later it produces no more than 2 or 3 plots before crashing. Thats just it the logs state completed succesfully no errors. But the farmer node does not recognise them as valid. And on 2 plotters the 1st ran diskplot for days with the I think chia 2.0 without any problems. But after updating to the 2.1 and later the 2.1.1 even my original plotter crashes on diskplot( which ran stable as a rock on the 2.0 version).

tuinboontje commented 8 months ago

Ok i think i got the not opening compressed plots sorted. was my bad and not the harvester the decrompression was turned of. I discovered this after reading the debug logs. started a series of 10 C5. 1st C5 did seize up at start and never got to allocating buffers. Deleted it and it suprisingly started with 2nd plot... Got a autoremover so i hope it will krank out 10 plots, o on sorry 9 the first 1 did choke upon start

tuinboontje commented 8 months ago

So it is a hit and a miss. Never makes it past third plot, makes 1 delete the next where it keeps hanging and then it will start the next. It is not set and forget but rather set and forget about it. as if plotting on its own when automated did not take long enough, i now have to keep an eye out for when it crashes, because it will!! and run all night burning electric on a plot that is never going to finish. Maybe code in sort of a counter that exits current plot when phase time equals or exceeds number x and then start with next in the batch. It's a fairly safe assumption when a phase or rather a subset lets say propagating, sorting or computing fx in the phase is not completed after say 1800 seconds it most likely never will.

could be as simple as timer.start if timer.count = or >= exit... then timer.stop else.... and start the next itteration

you most likely will have to define a sub routine or a function for counter this will be a 1 time definition that you can call upon anytime its needed, it most defenitly will put some overhead on the proces but i think a counter function or sub will be negligible, compared to the alternative running a routine all night that is never going to finish . And i know for a fact that it is a problem that more users have when the plotter craps out. It would be quite useful if you have the knowledge that when plotting. in the event that it does crash. The program wil stop clean up after itself and goes about it's business plotting. This can be applied linear to ram, disk and cuda plotting alike I think. And quite frankly I think my old coding teacher would have killed me for overlooking such an obvious single point of failure, and not code in a safety to handle it. I am not trying to attack you, but rather provide a maybe fresh view, and think constructively on how to further the efficiency and quality. I will put my money where my mouth is and pull the source code and make an attempt to put in this I think quite simple function and put it up for review by more qualified people. i do not know which language you used but you get the intention.

tuinboontje commented 8 months ago

Ok it plowed through 7 plots un interrupted. Thats a plus. Now i can realish into gpu plotting, as my riser and bracket for my server finaly got delivered in the mail this afternoon. But still going to check out if i can add a boundry of some sorts to define scope of "normal" operating parameters. Still think that 30 minutes for allocating buffers and resources will more then suffice for atleast ram and cudaplotting. No need to reinvent the wheel, that is well beyond the scope of my coding capabilities and lets be honest it is a very well designed wheel already after my first peak into the code. Even the temp cleanup is already arranged in the code i think, because it litteraly is done at the end.

tuinboontje commented 8 months ago

After careful reading your reply stating it is the most stable out of the 3( so you do know there are issues with the 3), is like spinning 3 spinning tops a b c. A 100mph b 75 mph c 50 mph and stating a is the most stable( the longest), which it is due to higher rpm and the gyroscopic effect. But the end result wil be the same for all 3. Lay on their side.....

tuinboontje commented 8 months ago

That said gpu plotter crashed after 1 plot out of 10. After deleting it did finish. On go 25.. crashes after 16. Luckily I caught it and after checking the logs from the mover I concluded that it hanged on allocating resources for over 45 or should I say only 45 minutes. Off go 100 alternating c1 through c5 and hybrid disk. So 1 c1 ram 1 c1 hybrid. 1 c2 ram 1 c2 hybrid and so on. To be continued...