getwork mining not correct for m7m

quicksandsurf commented 7 years ago

I know getwork is pretty much unused nowadays, but with XMG it actually still makes sense.

The submit of the getwork is wrong for magid, it expects uint32_t[32] (128 bytes) as big endian hex encoded, but cpu-miner is providing algo_gate.work_data_size (80 bytes) as little endian hex encoded.

I have changed the 2 issues statically for m7m as a debug effort and with sending 128 bytes big endian hex encoded I can successfully submit blocks to magid, and a quick comparison of test data + nonces does show me the exact same result for cpuminer vs m-minerd (on very low difficulties) but when mining for real all submitted blocks from cpuminer get rejected, the exact reason I'm still debugging (it sucks to have to wait days for a nonce to hit)

I'll continue to debug this but just wanted to put it out there in case someone knows what is going on already :)

JayDDee commented 7 years ago

Thanks for testing, getwork support is still a work on progress.

Based on your description I see two issues that can be handled by algo-gate with a few enhancements.

First define a custom data size for m7m. It is used by stratum and getwork but the bigger size shouldn't break stratum. That would mean that stratum and getwork use different sizes for m7m which seems unusual.

The second issue is big endian. Although algo-gate uses BE for stratum the same mechanism wasn't implemented for getwork. I recommend splitting up std_submit_getwork_result into std_le and std_be version similar to build_stratum_request. Then I can use it for all coins that require be that are probably also broken. If m7m requires further customizations it will need a custom implementation of submit_getwork_result.

Thanks for your excelent analysis and clear description. It's very rare that I don't have to ask for more info. I hope my suggestions above are equally clear. I'll await you test results and look forward to reviewing your changes

JayDDee commented 7 years ago

I think I found another issue with endian encoding.

m7m flips the endian of g_work in stratum_gen_work before it gets passed to get_new_work. A similar flip needs to be done in get_work before returning...

    /* copy returned work into storage provided by caller */
    memcpy(work, work_heap, sizeof(*work));
    free(work_heap);
    algo_gate.set_work_data_endian( work );       <------------------ NEW
    return true;

}

quicksandsurf commented 7 years ago

You are most welcome, though I must say I have some vested interest too, so an extra bit of effort is obviously required :)

Anyhow, I've found what was likely the reason for my rejected blocks, as I changed the getwork submit step but not the request one, where the work data gets parsed, which I've now changed for testing (you can see what I'm doing in my form, m7m_getwork branch). This is just a test I'm doing, haven't properly split the changes into the correct abstractions.

JayDDee commented 7 years ago

If you get it working I can do the final coding if you're not comfortable using algo-gate.

quicksandsurf commented 7 years ago

It is finally working, so with the changes on my branch get_work will succeed on magid.

As for how to integrate with your repo, it's your choice, really. I never touched algo-gate, but then again I had never touched a miner till last week at all.

Easiest for me, you just take my branch and adapt, but I'll be happy to make the fix generic and prepare a proper pull request, it'll just take a little while.

JayDDee commented 7 years ago

I've already started the final coding. I like your approach of putting the hook in work_decode instead of get_work.

I may have found another minor issue. the work_cmp_size for m7m may also need to be changed although the default works for stratum and seems to now work for getwork. This is used to determine how much of the work needs to be tested to detect new work.

I'll do some more analysis but unless I find something conclusive I won't change it.

quicksandsurf commented 7 years ago

You should be good with work_cmp_size, the default of 76 matches what miners from magi-project do.

JayDDee commented 7 years ago

Thanks, that was exactly what I was going to check.

There are 2 parts to this fix:

Change the m7m data_size to 128, which happens to be the default but I had overriden for some unknown reason.
Implement seperate LE and BE versions of work_decode and submit_getwork_result (similar to build_stratum_request) and use them appropriately. m7m, neoscrypt, zr5, drop, blake2b, decred will all be changed to use BE.

quicksandsurf commented 7 years ago

sound like a great plan. I'd personally implement the protocol descriptions as list of flags and not callbacks where the functions aren't specific to the algorithm (say, a is_bigendian flag where I added the opt_algo == ALGO_M7M), but your approach is consistent with the work already done and honestly it is more a question of preferred style than anything else (pointers to functions is something I only use if I have no alternatives).

JayDDee commented 7 years ago

I've just submittted the fix. Please review and test before I release it.

There are several benefits to using function pointers in this case. It removes all algo-specific code from the base code and buries it in the algo's file where it should be. This makes it much easier to add new algos with fewer files changed. Removing all the swicth/case statements in the main loop also provided a small performance boost. switch/case also doesn't scale well, it just gets uglier and slower with every new algo addition.

Thanks again for all your help.

quicksandsurf commented 7 years ago

Will review and test promptly, but the latter always depends on me finding blocks :)

And I agree with the reason for function pointers, but what I meant was that this particular case, the work_decode and submit_work, aren't really algo dependent, there's two flavors that depend on some algo flags or variables. That's already the case with, for example, work_cmp_size. The compare function is generic, not algo specific, so you wouldn't in principle implement a work_cmp_m7m which differs from others only by the variable size. The same could apply to being big or little endian on the work to/from hex handling, it could very well be a flag instead of separate functions.

But I'm nitpicking and it is certainly not my place to push "how I would do it", please ignore my rants!

quicksandsurf commented 7 years ago

FYI, still debugging it but your latest version resulted in a rejected work, which might have been just a coincidence, but I'll double check it now.

quicksandsurf commented 7 years ago

Still testing, but here's what I believe is the issue: work->target is already in the correct format, as far as the other miners are concerned, so the be32enc of work->target at https://github.com/JayDDee/cpuminer-opt/blob/master/cpu-miner.c#L327 should be removed from the big endian version of std_be_work_decode.

JayDDee commented 7 years ago

I messed up when I recoded your changes in work_decode. You only be encoded the data and left the target untouched. This means that the target is curently le encoded. The way these fucntions work (as I have come to learn) is they don't care about the current encoding. be32enc will effectively do nothing if the data is already be.

I suggest you use your own code for now. I want to study a bit more because i want to also rationalize the use of set_work_data_endian which encodes new stratum work (and also doesn't touch the target). When I first implemented this I was clueless and just copied and pasted whatever the algo previously did. Now it's becoming clear that it's either le or be in all cases so I should clean up the mess.

I will take more time as this is a significant design change that will also affect stratum mining. I hope to reduce the amount of code duplication.

In the mean time you can use your own fork to solo mine magi. Sorry you wasted all that hash due to my error.

quicksandsurf commented 7 years ago

Not a waste, an investment :)

I haven't hit a block since, but the debug information shown the hash was nowhere near the target, so the target encoding on receive work was definitely the problem (that or the comparison of hashed and target, but that would be very unlikely).

I'm currently running 2 versions, one with just the work->data be32enc, one with that plus work->target le32enc (not be, we've established that one is wrong).

I could of course just review the code properly and create a few test cases, but that takes time I don't readily have so trial and error it is. Take your time to get this right, I'll keep you posted of my findings.

JayDDee commented 7 years ago

I found some problems with my suggestion (since deleted). stratum uses b32enc while getwork uses be32dec. It seems they are equivalent but I need to study some more before messing around with them. I'm still confident I can simplify the endian encoding for all algos but due to the long test cycle I'll be doing more careul review before suggesting changes.

The way I see it so far:

New stratum data is encoded up to the nonce (80 bytes) using be32enc only if BE is required. New getwork data is encoded on all data (128 bytes) using be32dec if BE is required and le32dec otherwise.

be32enc and be32dec seem to do the same thing different ways. To unify them requires changing 3 things: the size, the function used, and the (apparanly redundant) encoding for LE. The LE encoding of the target may also be found to be redundant in all cases.

Submitted stratum data encodes only the nonce and ntime. Submitted getwork data encodes the entire data size.

The stratum approach seems more efficient as it only recodes what it needs to. I'd like to change getwork to use the same approach but it's pretty speculative which requires trial and error testing with a long test cycle.

I will avoid acting based on short term thought. I need to think this through some more and try to de-risk it.

JayDDee commented 7 years ago

I think my plan is too risky.

Getwork should continue to convert all 128 bytes in both directions because all 128 bytes are submitted in the result. Stratum only sends back 3 values so they are the only ones that need to be encoded when submitting results.

I still think target can be safely ignored in all cases because it isn't sent back with the result. This lines up with your version 1 test.

I'll await your test results before proceeding.

JayDDee commented 7 years ago

I have released v3.7.0 with a fix for target encoding when using big endian data plus some other bug fixes. If you find further problems you can reopen this issue or create a new one.

JayDDee / cpuminer-opt

getwork mining not correct for m7m #38