JayDDee / cpuminer-opt

Optimized multi algo CPU miner
Other
763 stars 541 forks source link

--max-diff option works not as expected in solo mode #392

Closed YetAnotherRussian closed 1 year ago

YetAnotherRussian commented 1 year ago

Version is 3.21.1 cpuminer-avx2-sha-vaes.exe -t 8 --cpu-affinity 0x5555 -a algo -o http://127.0.0.1:54321 -u aaa -p bbb --max-diff=0.45

image

Seems this option makes sense, mining stops somehow on a high diff job, but then it continues.

I'm not sure if u got some solo env, gonna make detailed logs, if needed.

There's another micro issue nearby:

image

This demotivating block ttf on start makes people to close and forget rather than to wait for a correct stats :'( Better to slow it down a bit on start, but I'm not sure.

YetAnotherRussian commented 1 year ago

Edit: someone mining a coin solo should know if it supports multiple alrorithms and also know that networkhashps and anything derived from it is unreliable. IMO.

No, I mean something else...

I mine a coin, it has only one pow algo, target block time is 5min, nethash is 50Gh/s, diff has some value. As a solo miner, I got this info (nethash and diff) from gbt in cpuminer, you calculated that net ttf as 2 minutes. I got 50Mh/s, mine ttf based on diff and speed is e.g. 1 day, I'm very interested to know this info.

How do I use that fake 2 minutes ttf info? Network will adjust it's diff in next block(s) depending on it's diff algo, if the block is found in 2 minutes instead of 5.

I don't mean to parse several algos etc., I mean to delete this useless calculation at all (keeping in mind it is useless, it is wrong @ startup, it is wrong in case of several pow algos). I do not know any solo or shared pool either which provides net ttf in it's stats, pools do provide their expected ttf which is calculated absolutely the same as miner ttf in cpuminer-opt.

JayDDee commented 1 year ago

No, I mean something else...

I'm still not sure if understand, but I do, you want nethashrate removed completely from cpuminer getwork, stratum & GBT. If I did that I'd expect complaints and I don't want that. It's always difficult to take something away from users who expect it.

My philosophy is cpuminer is like a race car, you need to be a skilled driver to get the most out of it. Those skills should include how to scutinize what the machine is telling you and filter out the noise and bad info.*

Nethashrate reports from stratum mining are a rough estimate because stratum doen't provide the nethashrate in any oif it's methods. A pool's API could do it but I'm not aware of any that do. API integration with cpuminer is very basic and would have too be significantly improved to support pool APIs, especially because there are many different pool APIs. Nethashrate from stratum mining is purely a rough estimate based on netdiff and block emission during the session and assumes average luck.

GBT/getwork, provide the nethashrate directly to the miner with the mining_info method and is implicitly trusted by cpuminer.

Nethashrate is provided on a FYI basis with caveats. I isn't used for any other calculations but it might be of interest to the user. And the skilled user is expected to recognize when the data is incorrect, just as I have that expectation with the garbage startup data.

Though the GBT nethashrate is useless for coins with multiple algos that is a choice made by the coin. A multialgo nethashrate is as useless as a multialgo netdiff would be. They should report the nethashrate for the current algo as they do for netdiff. That might be an issue the coin's developpers could address.

I was not aware of this issue before but it doesn't really change my opinion about reporting nethashrate. I could filter it for coins that have multiple algos. I do that now for pools with multiple coins. But it would take some work to implement it for nethashrate and I'm not very motivated to do it. If the data can be useful to some, provide it and let the "skilled" users figure it out. Not providing the data is useless to all.

JayDDee commented 1 year ago

I making a small change to the work struct, swapping the first 2 elements and reducing alignment for target to 32. Target is only 32s byte so it will never be able to use 64 byte vectors and never require 64 byte alignment. Swapping positions will reduce the size of the void between them. I like to use 64 byte aligment for the struct because it's also the alignment of the cache.

JayDDee commented 1 year ago

No, I mean something else...

I mine a coin, it has only one pow algo, target block time is 5min, nethash is 50Gh/s, diff has some value. As a solo miner, I got this info (nethash and diff) from gbt in cpuminer, you calculated that net ttf as 2 minutes. I got 50Mh/s, mine ttf based on diff and speed is e.g. 1 day, I'm very interested to know this info.

How do I use that fake 2 minutes ttf info? Network will adjust it's diff in next block(s) depending on it's diff algo, if the block is found in 2 minutes instead of 5.

I've read this few times. For solo mining the network hash rate is somewhat redundant but can be used to audit actual TTF. TTF is a function of hashrate and diff, so the hashrate can be used to calculate the expected block TTF and compared to the actual TTF, producing the luck value.

For stratum the network diff and observed block TTF is used to calculate an estimated network hash rate, there is no redundancy and it assumes neutral luck.

Is that a better answer?

Edit: If networkhashps in mining_info is a simple arithmetic total, it's useless for a multialgo coin.

If networkhashps is a nomalized total, adjusted for the characteristics of each algo, it can be used with a normalized network diff to calculate a TTF for all algos combined.

If networkhashps was a the hashrate of the current algo, cpuminer could calculate the TTF for that algo using that also's network diff.

If mining info included the nethashps for every algo, cpuminer could calculate the TTF for any algo as well as an all algo TTF.

Another edit:

Miner TTF and Net TTF are only displayed for solo (gbt, getwork) mining because network hashrate is provided by the server and is assumed to be reliable. Actual reliability depends on the reliability of the data provided. That's where a skilled user comes in, being able to determine how reliable the data is. I prefer to leave that decision to the user rather than the miner making the decision for the user.

Finally, I hope, the TTF is determined by the diff of the current block and the hashrate reported at the same time as the block was emitted. It gets recalculated every block based on the new diff and new hashrate.

JayDDee commented 1 year ago

Are you testing on native linux or Msys2, or maybe WSL?

You may have missed that question the first time but I'd like to know the answer especially if it's WSL. That is a sigficant discovery if cpuminer can work on WSL at native speed.

I've tried CygWin, it works but the hash rate is low, but I never thought of trying WSL.

I'll be starting to prepare for the release now. should be out later today, pending any last minute problems.

YetAnotherRussian commented 1 year ago

Are you testing on native linux or Msys2, or maybe WSL?

You may have missed that question the first time but I'd like to know the answer especially if it's WSL. That is a sigficant discovery if cpuminer can work on WSL at native speed.

It is WSL. I've been using WSL almost since it's integration to Windows and it is faster than using Virtual Box or VMware Player for simple things that do not require special network configuration etc. There is no strange issues like the one that vmware software fails to enable cpu counters if Hyper-V is installed (enabled in Win components), and vice versa. I do not see any hashrate drop in cpuminer-opt or it is less than 1-2% on my hardware.

JayDDee commented 1 year ago

cpuminer-opt-3.21.3 is released.

JayDDee commented 1 year ago

I'll have to explore WSL.

YetAnotherRussian commented 1 year ago

All win binaries from the latest release do crash. Linux binary built from git does not.

Stratum mining seems to work.

JayDDee commented 1 year ago

I don't get a crash with the Windows binaries with stratum. Is the crash only GBT and only Windows binaries? Does WSL run Windows code or Linux code? Linux will display CPU temperature, Windows does not.

There is a problem with the package, the AVX2 build is missing. I'll look into that.

Edit: winbuild-cross.sh seems to be messed up, I made some changes to disable cpu groups but broke something. I'll revert to the previous version of winbuild-cross.sh and work from there.

YetAnotherRussian commented 1 year ago

I don't get a crash with the Windows binaries with stratum. Is the crash only GBT and only Windows binaries? Does WSL run Windows code or Linux code? Linux will display CPU temperature, Windows does not.

Only GBT, only Win binaries. My WSL env is Ubuntu 20.04 LTS, so it runs Linux code.

image

JayDDee commented 1 year ago

I just uploaded a rebuilt binaries package with no source code changes. I messed up winbuild-cross.sh, fixed it now. I'm not aware of anything that would affect only gbt, if there is it's in the source code.

JayDDee commented 1 year ago

My WSL env is Ubuntu 20.04 LTS, so it runs Linux code.

Does it actualy display a correct temperature and clock rate?

YetAnotherRussian commented 1 year ago

Does it actualy display a correct temperature and clock rate?

No image

I've never seen this working. Real (not a WSL, not a VM) Ubuntu machines with different hardware (from FX8320 to i5-11600K) always reported all zeroes.

I just uploaded a rebuilt binaries package with no source code changes. I messed up winbuild-cross.sh, fixed it now. I'm not aware of anything that would affect only gbt, if there is it's in the source code.

Just checked new builds, all of them (from sse2 to avx2-sha-vaes) crash on receiving new work.

JayDDee commented 1 year ago

That's not good. Just to make sure I'm clear:

wsl gbt: ok gbt mingw binaries package: crash stratum mingw binaries package: ok everything else ok

That's scary. I'm not aware of any source codeused by gbt that is specific to Windows but that seems to be the trigger. I'm suspecting it's misalignment issue, do you get the misaligned log with -D? I left that in just in case. I'll have to dig into the gbt code.

YetAnotherRussian commented 1 year ago

wsl gbt: ok gbt mingw binaries package: crash stratum mingw binaries package: ok everything else ok

Correct. W/o symbols I can show only this:

image

I'll rebuild with them a bit later.

do you get the misaligned log with -D

No

JayDDee commented 1 year ago

I'm confused, how do you run gdb from Windows. Is it crashing on WSL Linux too?

I reviewed the changes in cpu-miner.c and the only change to gbt was the allocation of ret_work in workio_get_work. Are you using max-diff or cpu-affinity? They also changed, try without.

YetAnotherRussian commented 1 year ago

Is it crashing on WSL Linux too?

No

I'm confused, how do you run gdb from Windows.

I see no point to launch it from Code::Blocks or something. VS Community is able to launch CLI app with params to debug (needed for cpuminer-opt) but I am sure that debugger won't catch up with symbols.

Are you using max-diff or cpu-affinity? They also changed, try without.

None of them. That screenshot shows no affinity and only one thread (-t 1).

JayDDee commented 1 year ago

That means it's likely a build issue again. I wonder if I upload the wrong build. If the package has only 6 executables it's the old one. If it has 8 it's the updated one.

I'll upload again just to make sure it's the right one. I've deleted the existing one so when you see it it's new. I've marked the release status pre-release so you may have to look harder for it.

If it crashes with the updated builds I have no idea.

Edit: new binaries uploaded.

The older builds, AVX and older, were built the same as previous releases so shouldn't be affected. If they are it may be due to the prehash optimization. That required seperate code changes for stratum and gbt/getwork. Everything tested ok with stratum and the code change is simple. The only difference is with gbt/getwork one of the miners runs the prehash instead of the stratum thread. That shouldn't make a difference, they both update g_work and restart the threads.

I'm scared. The problem with the builds caused compile errors due to missing -maes on some generic builds for CPUs that support it. I don't see how that would cause a crash if it compiled successfully.

But that doesn't explain why WSL works.

Is this address still valid? I'll probably need to test gbt myself.

Try stratum+tcp://pool.cryptopowered.club:1304 using wallet GfWkqzKQfQDMQxjwi5iJCDbzhsCNxGLKHr and pass x, share TTF is ~30sec @ 15Mh

YetAnotherRussian commented 1 year ago

Is this address still valid?

Should be

JayDDee commented 1 year ago

Is this address still valid?

Should be

My bad, it's not gbt.

The latest binaries package needs to be tested with gbt, I can't do that. If it still crashes you can edit cpu-miner.c and comment out the gate call to prehash on line 2349. The will determine if the prehash is the problem. With this patch the hash will be invalid so anything submitted will be rejected but the test is valid for it's purpose.

Not all algos use the centralized prehash but all yescrypt and yespower do.

Edit: my bad again, you can't rebuild the package after editing the source. I just need to know if the latest package works, including the older architecture builds.

The dilema is that the intersection of working gbt Linux, not working gbt windows and working stratum Linux & Windows sems to be an empty set.

1.The affinity changes affect stratum and gbt equally regardless of OS.

  1. The build changes affect cpu groups affinity on Windows only gbt & stratum equally, old CPU architectures only.
  2. The prehash change affects gbt Windows and Linux equally.

1 & 3 seem to eliminate a problem with gbt. 1 & 2 sems to eliminate the affinity changes related to disabling cpu groups. There's nothing left.

The only change to gbt was the allocation of ret_work in workio_get_work which affects both Linux and Windows, but Linux works. The only change to affinity was disabling Windows cpu groups on new architectures and setting affinity the old way. The old arch builds are unchanged. The only change that affects stratum and gbt differently is prehash, but gbt works on Linux. The changes to struct work affect everything, Linux, Windows, stratum & gbt. All the other changes are algo specific and also affect Linux, Windows, stratum & gbt.

JayDDee commented 1 year ago

Does it actualy display a correct temperature and clock rate?

No image

I've never seen this working. Real (not a WSL, not a VM) Ubuntu machines with different hardware (from FX8320 to i5-11600K) always reported all zeroes.

I don't expect it to work in emulators or on Windows hosts but it should work on newish CPUs with Linux. It works for me on CPUs from Sandybridge to Icelake, SkylakeX, Zen1 and Zen3. Icelake is the only laptop, Zen3 replaced Zen1 on the old mobo. I had to do some fiddling to get some of the old CPUs to work but SkylakeX and Zen3 worked out of the Box. My Zen3 is a 5700G very similar to yours, but my mobos are all Asus. Maybe it's a mobo issue, maybe different mobo brands use different paths and I only test on Asus.

YetAnotherRussian commented 1 year ago

If you need some solo mining environment to test, here it is:

  1. Download https://github.com/myriadteam/myriadcoin/releases/download/v0.18.1.0/myriadcoin-0.18.1.0-win64.zip and use default data folder
  2. In ...\AppData\Myriadcoin\ create text file "myriadcoin.conf" with this content (line by line):
    rpcuser=cpuminer
    rpcpassword=cpuminer777
    rpcbind=127.0.0.1:44444
    rpcallowip=127.0.0.1
    rpcport=44444
    server=1
    listen=1
    daemon=1
    algo=Yescrypt
  3. Restart the wallet
  4. Use "cpuminer -a yescrypt -D -o http://127.0.0.1:44444 -u cpuminer -p cpuminer777 --coinbase-addr=YOUR_ADDR_HERE"
  5. Possible algo values are Yescrypt or Argon2d (as for CPU), each time on algo change you need to restart your wallet.

This one won't take too much time to sync. There're plenty of others but they all are bad for testing (several days to sync, wallet issues etc.).

JayDDee commented 1 year ago

Do the old builds, like AVX, crash with any copy of the v3.21.3-windows? Those builds did not change, if they're broken it's definitely a code problem.

YetAnotherRussian commented 1 year ago

Do the old builds, like AVX, crash with any copy of the v3.21.3-windows?

Yes, even sse2 and sse42

JayDDee commented 1 year ago

I initially misread your post. It did crash. That means the build problem didn't cause the crash. That's very bad.

What about the other issues. Do max-diff and affinity (with and without cpu groups) work using WSL? I'm going to have to rolllback some of the code. I want to avoid un-fixing things that now work.

YetAnotherRussian commented 1 year ago

Max-diff seems to work properly. As for affinity, I see no issues with one cpu group, but I have no dual-cpu system, NUMA or 64+ thread system either. Win 11 is not tested, too.

JayDDee commented 1 year ago

Thanks I won't remove those. The prehash optimization is definitely coming out, it was a disappointment anyway.

Edit: Myriad wallet is a no go. Installed it on Linux, after 4 hour header syncing was only half done, not even an estimate. Gave up.

It going to take more time for a fix. I won't be able to test GBT and Windows. Testing Windows is already much more difficult due to the build turnaround, it's not simply edit, make and run. I need to do a lot more planning and double checking to account for the inability to test GBT.

Update: I think I have a plan. Start from v3.21.2 and add only the fixes associated with this issue tp the next rel.ease. Centralized prehash is probably gone for good. The remainder of the changes will wait for the release after the next release.

Edit: I went further and deleted the myr-gr stats fix, the result is only 2 files are changed: cpu-miner.c and winbuild-cross.sh. Now testing AVX2 on Windows 10 with stratum. The only change left that can affect GBT is the _mm_malloc change for the misaligned crash. I works on Windows in other places in cpu-miner so it's as low risk as I can get it while keeping the crash fix.

JayDDee commented 1 year ago

v3.21.4 is released.

Some notes:

Tested the cpu affinity, Windows binaries, without cpu groups, stratum only. Tested affinity, Windows MSys2, with CPU groups, stratum only. Tested conditional mining on Linux with max-temp.

The misaligned log is still avaiable with -D if needed. Will remove it in a future release.

If you want to follow up on the CPU temp issue with bare metal Linux, report the CPU models you try and let me know which work and which don't. You can also search other hwmon paths to try to find one that works. Refer to sysinfos.c for tips. i7-4790k, i9-9940X & r7-5700G also work with the existing set. I have been asuming the working path was cpu dependant but it may in fact be mobo dependent so that info might also help. With the increase in temp in the newer cpus the max-temp option has become more valuable. Also if you know of an API to get cpu temp on Wndows, I can only find apps.

YetAnotherRussian commented 1 year ago

With the increase in temp in the newer cpus the max-temp option has become more valuable.

Yep, 5700GE is "burning" with it's 46C using -t 16 and AVX2 load :D Should be usable with 13900K or somethin'

v3.21.4 is released.

Solo mode is still crashing :( I do not even see the misaligned log.

Edit: Myriad wallet is a no go. Installed it on Linux, after 4 hour header syncing was only half done, not even an estimate. Gave up.

Try https://github.com/unitusdev/unitus/releases

\AppData\Roaming\Unitus\unitus.conf

rpcuser=cpuminer rpcpassword=cpuminer777 rpcbind=127.0.0.1:55555 rpcallowip=127.0.0.1 rpcport=55555 server=1 listen=1 daemon=1 algo=Argon2d addnode=164.68.110.226:50603 addnode=172.105.182.91:50603 addnode=176.223.141.79:50603 addnode=46.28.107.182:50603 addnode=51.15.48.160:50603 addnode=79.130.30.79:50603 addnode=91.206.16.214:50603 addnode=[2001:1c06:1e05:bf00:78e2:cdff:fe33:bfcc]:52786 addnode=[2001:41d0:303:6435::1]:54456 addnode=[2001:41d0:303:79a::]:43820 addnode=[2001:41d0:303:79a::]:43940 addnode=[2001:41d0:303:79a::]:50168 addnode=[2001:470:4189:3:216:3eff:fef8:d471]:55604 addnode=[2001:bc8:182c:2020::1]:55128 addnode=[2400:8907::f03c:92ff:fe41:2b19]:36426 addnode=[2a01:4f8:10b:209d::2]:53475 addnode=[2a02:7b40:b0df:8d4f::1]:50603

There're 9 peers online. With i9-9940X you'll be able to find a block in... 15 minutes?

Better question is how to debug Windows ship properly, rather than making a build with "applog" on each line...

JayDDee commented 1 year ago

Solo mode is still crashing :( I do not even see the misaligned log.

I didn't think the issue was misalignment else it would likely log/crash on Linux too. Must be something to do with changing calloc to _mm_malloc. I assume v3.21.2 works with calloc, if it doesn't crash on misalignment? If you use the AVX build alignment won't matter.

I can't find any significant difference between malloc & calloc other than zeroing the block and the implication the block is an array. I can use another method to align the block: alloc bigger and shift the work pointer as required, but if calloc has hidden secrets in Windows that may not work either.

Better question is how to debug Windows ship properly, rather than making a build with "applog" on each line...

It still bugs me that MSys2 and WSL builds both work. There must be some useful Windows tools to debug crash dumps, they shouldn't all be dependant on MSVC.

How big is the blockchain for unitus? Maybe I can DL a copy from somewhere?

Edit: I found this, looks interesting https://www.windowscentral.com/how-open-and-analyze-dump-error-files-windows-10

YetAnotherRussian commented 1 year ago

If you have Visual Studio

I don't, that's why the binaries are built the way they are. Without symbols it's challenging to identify the code location but it should still be possible to get some useful info even from a raw dump. There should be some reasonable info,ASCII strings on the stack give a clue, registers will have recently used pointers. I wonder is the mingw symbol table would help, I could build a debug load with them.

JayDDee commented 1 year ago

I messed up your post, editted instead of quoting, lost the link.

YetAnotherRussian commented 1 year ago

I wonder is the mingw symbol table would help,

Possibly. I don't know.

Link: https://mega.nz/file/iuQQBLIJ#GnJtxJeaq6HmZZ0ssvLwyTIGwDWRi_fwpcxNgv8cqRY Pass: cpuminer

I got an idea to stop the wallet, launch cpuminer-opt from debugger, then open wallet => "delayed" crash on getting work should happen.

UPD.: Not so useful

image

image

JayDDee commented 1 year ago

UPD.: Not so useful

TLDR: last minute change jump to bottom.

I don't know what code this is but it looks string related. RtlReportCriticalFailure is heap corruption. rtlIsAnyDebuggerPresent is the kernel debugger, and from it's address the function isn't very far from this code. It looks like kernel code checkimg string integrity

The only string related changes were to a couple of logs, but struct work has strings as well as arrays. I assume that arrow is where it crashed. It's a software interrupt, int 3 is a breakpoint. That measn the previous "test esi,esi" failed (I assume non-zero) and fell through to the interrupt.

rtlIsZeroMemory, in the previous function is interesting. This popped up from a search...

https://stackoverflow.com/questions/24183982/error-from-ntdll-dll-because-of-a-malloc-c

Reading through it I wonder if the heap manager was confused where the block actually started due to to the way mm_malloc works. If the heap manager assumed standard alignment when the bkock was actually aligned differently it would definitely cause a crash.

Maybe useful after all.

Everything seems to be pointing to mm_malloc now.

Using 2 pointers, block pointer is returned by malloc, struct pointer is block pointer adjusted for alignment. Block pointer is used by free, struct pointer is used by the application. The heap manager should only see the block pointer, only the local function will be aware of the struct pointer.

Edit: The point of deviation seems to be which Windows kernel is being used, The binaries package will obviously use Windows own proprietary kernel, but emulators like CygWin usually have their own version of a Windows kernel. WSL I don't know.

Edit: I have a slighly new theory. ret_work (a pointer) gets passed around using a list. It gets picked up by get_work as work_heap and copied to g_work. work_heap is the block pointer, not the adjusted struct pointer. if they are not equal the data will be corrupted and acrash is imminent. work_heap pointer could be similarly adjusted to calculate the struct pointer of a misaligned block.

I wouldn't do this with mm_malloc, I'd go back to the original calloc and write a common aligner utility, as previously described, so they'd be in sync. Only mining code would know about the struct pointer while tq, heap etc would be blissfully ignorant.

I assume all the list software is type agnostic only having a pointer and size.

Edit: I'm feeling pretty good about this. The only concern is it's based on the assumption there is an issue between mm_malloc and Windows kernel. By managing the pointer myself I hope to workaround any such issue. The kernel will never see the aligned pointer and miner code will always used the aligned pointer.

Edit: can you look this over? I want to make sure I don't make a silly logic error with align_ptr. struct work type definition will also use WORK_ALIGNMENT instead of the current literal 64. I think I'll also make the 2 aligned pointers named the same for clarity.

miner.h #define WORK_ALIGNMENT 64 // block size must be padded by alignment number of bytes, don't use returned pointer to free. static inline void *align_ptr( void *ptr, uint64_t alignment ) { const uint64_t mask = alignment - 1; return (void*)( ( ((uint64_t)ptr) + mask ) & (~mask) ); }

workio_get_work pushes heap_ptr heap_ptr = calloc( 1, sizeof(struct work) + WORK_ALIGNMENT ); if ( !heap_ptr ) return false; ret_work = (struct work*) align_ptr( heap_ptr, WORK_ALIGNMENT );

get_work pops heap_ptr heap_ptr = (struct work*) tq_pop(thr->q, NULL); if ( !heap_ptr ) return false; // Align heap pointer work_heap = (struct work*) align_ptr( heap_ptr, WORK_ALIGNMENT ); /* copy returned work into storage provided by caller */ memcpy( work, work_heap, sizeof(*work) ); free( heap_ptr );

Start here:

Edit: If this doesn't work my only option is to disable loop auto vectorization in gbt_work_decode and revert to the original code.

Edit: Actually the last option might be the best option. I can hand code the loop with the same number of SSSE3 instructions as the compiler does with AVX2. SSE2 takes a couple more. It's also functionally identical to memrev function that was imported for segwit. Hand coding should override auto-vectorization eliminating the need for 32 byte alignment. It looks like it might be a win-win. I need to sleep on it, don't want to make a decision while tired.

YetAnotherRussian commented 1 year ago

I'm not sure but it may be a good idea to make a copy of cross build script with "-ggdb" option, and each time you publish .zip archive there'll be a copy of it (debug builds). So, any future release could be debugged easier and faster. New issue = detailed info from the beginning. Good thing is that it would be compiled with all the versions you use, not some random ones like GCC 12 or something. It is also being practiced by some coin devs with their wallets (if I mean devs, I mean devs, not a shipcoin copycats). As you use -j 8 now, compiling a second pack should be very fast.

I could make a small documentation on installing and using ggdb in Windows then (through a pr here). Installing VS to perform such things is a real waste.

JayDDee commented 1 year ago

The puzzle is coming together. Stratum does not crash because there is no similar loop in stratum. First, stratum calculates the target from nbits, gbt provides the target explicitly. Second stratum copies data directly to g_work without the intermetiate dynamically allocated, misaligned, temp work struct.

Confidence is rising.

Even though stratum can't test the optimized hardcoded memrev replacement, I can write a little test routine to do a dummy copy with test data and verify correct functionality.

Regarding your ggdb suggestion. It might be a good idea, at least for the next release or until this issue is resolved. For the longer term, real devs might be more inclined to compile themselves rather than use the prebuilt binaries.

I am however, interested in WSL as another way to use cpuminer-opt on Windows. Any special procedures there?

YetAnotherRussian commented 1 year ago

Any special procedures there?

Well I just use the ones you can get with "wsl --list --online" from PowerShell. Affinity from there works OK. Another useful thing is that when you type "explorer.exe ." from your root there, you Linux filesystem gets opened inside Windows explorer. So there's no need to mount, use some network drives or http servers to share files.

JayDDee commented 1 year ago

Test results

Test Code (printf deleted for brevity): applog(LOG_INFO, "End start\n"); static const uint8_t in[32] = { 31, 30, 29, 28, 27, 26, 25, 24, 23, 22, 21, 20, 19, 18, 17, 16, 15,14,13,12,11,10,9,8,7,6,5,4,3,2,1,0 }; static uint8_t out[32]; // printf in *((__m128i*)out ) = mm128_bswap_128( *((__m128i*)in+1) ); *((__m128i*)out+1) = mm128_bswap_128( *((__m128i*)in ) ); // printf out1 for (int i = 0; i < 8; i++ ) ((uint32_t*)out)[7 - i] = be32dec( (uint32_t*)in + i ); //printf out2 applog(LOG_INFO, "End test\n");

Results, tested with AVX2, AVX & SSE2, all good: [2023-03-14 10:47:31] Start test in : 1f 1e 1d 1c 1b 1a 19 18 17 16 15 14 13 12 11 10 0f 0e 0d 0c 0b 0a 09 08 07 06 05 04 03 02 01 00 out1: 00 01 02 03 04 05 06 07 08 09 0a 0b 0c 0d 0e 0f 10 11 12 13 14 15 16 17 18 19 1a 1b 1c 1d 1e 1f out2: 00 01 02 03 04 05 06 07 08 09 0a 0b 0c 0d 0e 0f 10 11 12 13 14 15 16 17 18 19 1a 1b 1c 1d 1e 1f [2023-03-14 10:47:31] End test

mm128_bswap_128 is new, implemented differently for SSE2 and SSSE3.

#if defined(__SSSE3__) #define mm128_bswap_128( v ) \ _mm_shuffle_epi8( v, m128_const_64( 0x0001020304050607, 0x08090a0b0c0d0e0f ) ) #else #define mm128_bswap_128( v ) mm128_swap_64( mm128_bswap_64( v ) ); #endif

Edit: I'm going to have to use this for memrev because it is also used with the heap and could be autovectorized.

JayDDee commented 1 year ago

Do you have a public dropbox at mega? I could drop a new [debug] binaries package instead of publishing a release? I have google drive but don't use it, kept flagging cpuminer as malware. Use email for privacy if desired jayddee246@gmail.com.

Or maybe I can attatch it to an email, it a little big though.

YetAnotherRussian commented 1 year ago

Just make a pass-protected zip or 7z archive and upload to https://www.file.io/

Small file, no need for privacy.

JayDDee commented 1 year ago

I'm going to reintegrate the changes from v3.21.3, minus the prehash centralization. It was an experiment, a proof of concept that didn't work out as expected. It was intended to improve the scaling to large numbers of threads but actually had a negative effect in some cases.

I don't think I'll build with -ggdb. It could change the build in a way that invalidates the testing. I'm confident now the problem is solved.

JayDDee commented 1 year ago

https://file.io/3nFv5PwMc9dI pw = align It's a duplicate of the file I intend to publish, only password protected. Currently testing on 2 Linux and 1 Windows binaries.

I left the misaligned log available but it will be removed later. Misalignment of the data is normal.

Edit: I'm rebuilding for publishing. I forgot to update the RELEASE_NOTES. I also removed the misalign log . Your testing is not affected.

YetAnotherRussian commented 1 year ago

I see the crash is solved. I just need some time (~8-10 hours) to find some different algo blocks to confirm. That's about the build from that archive. I've put 2 coins w/ 4 algos in total. All of 'em with 2 threads and affinity.

I won't be able to test avx512 builds - no CPU.

UPD: I now see "Net diff 0" image

It was not in v3.21.3. Miner TTF calculation is now wrong:

image

JayDDee commented 1 year ago

Net_diff 0 was a stupid mistake, deleted too many lines. I only need gbt tested, arch doesn't matter. Everything else is tested. Net TTF needs net_diff, if it worked in v3.21.2 it will work with net_diff fixed.

Edit: Are you using segwit? There are 2 byte reversal loops in segwit code that I also optimized and made alignment agnostic. That should cover everything in GBT that I can't test.

You can retest the subissues if you want like groestl with arch < AVX2, affinity with arch >= AVX2, conditional mining any arch. You should nothing about CPU groups unless you compile from source with it.

YetAnotherRussian commented 1 year ago

Are you using segwit?

Seems to be yes, it is being showed in log. I'm not sure about how it's configured in wallet or network(s)

JayDDee commented 1 year ago

Any other issues with the logs that I might be able to address now? We already discussed net hashrate with multiple algos, same issue as nethashrate with multiplc coins in a pool, but more difficult to hide. My opinion is it's a coin problem if it's incorrect.

Anything else? Anything from last stable release v3.21.2?

Otherwise I'm satified and ready to release.

YetAnotherRussian commented 1 year ago

Anything else?

I've set the CLI output as ">> log.txt" so will read tomorrow. It's almost night in my time zone.

JayDDee commented 1 year ago

Have a good night and thank you.

Edit: If segwit is enabled it means the wallet wants it. It also means the test provided the necessary code coverage to pass. With netdiff broken you can't test max-diff.

Edit: did a little reading about WSL and it seems to confirm the NT kernel was at the root of the heap corruption crash. WSL2 (I assume you were using WSL2) has a real Linux kernel. The Windows Binaries package obviously uses the Windows kernel.

This seems like a good choice for the preferred way to run cpuminer-opt on Windows. First I won't need to build binaries anymore, users can build for their specific architecture, it's Windows software and works mostly from a Windows perspective so it won't chase away linux-phobes. With native performance I can't think of any negatives.

Edit: I'm mostly interested in the new block log for GBT, I rarely ever see one. Most of the others I can test with stratum. Diff targetting is different for GBT but it takes a very large sample to test the target threshold, and a very long time to obtain a suitable sample of blocks solo. There aren't any significant changes to any algos except those noted in the release. Luffa had a tweak but it would only affect AVX512 X16R family when Luffa is first in the hash order. I have previously tested it.

YetAnotherRussian commented 1 year ago

log.zip

Btw I see nothing interesting... Please not the time and zone are wrong