JoinMarket-Org / joinmarket-clientserver

Bitcoin CoinJoin implementation with incentive structure to convince people to take part
GNU General Public License v3.0
725 stars 178 forks source link

Tumbler fails every time, "Utxos that passed age and size limits, but have been used too many times" #901

Closed MWICA closed 3 years ago

MWICA commented 3 years ago

Using joinmarket on command line. I have conducted a total of 3 tests when trying out joinmarket. All 3 tests have failed with an error "Utxos that passed age and size limits, but have been used too many times" part way through the tumble process. It also states that these UTXOs cannot be used again. On the third test I started out with a new wallet but it still failed. During each test I am depositing some UTXOs into 3-4 addresses in mixdepth 0, then running the tumbler script.

Does anybody else run into these same issues? If not, what am I doing wrong?

Lobbelt commented 3 years ago

Could you provide some background as to why each of the attempts fail? I had the same issue, but it was related to the transaction fee being much lower than expected. (cf. #898)

AdamISZ commented 3 years ago

All 3 tests have failed with an error "Utxos that passed age and size limits, but have been used too many times" part way through the tumble process. It also states that these UTXOs cannot be used again.

I didn't see this problem. But may you get lucky, if trying again for those UTXOs with taker_utxo_retries=4 or something like this To get the stuck UTXOs further, the next depth. and revert setting afterwards.

Please don't suggest this to users. It is very likely to cause them to use up commitments uselessly, and it will keep happening to them if they have a different setting here. Also note that a few months back (I don't remember exactly when) I fixed a bug that was causing makers to not record and transfer commitments as effectively as they should, meaning this is even more likely to cause failure than it used to.

AdamISZ commented 3 years ago

Could you provide some background as to why each of the attempts fail? I had the same issue, but it was related to the transaction fee being much lower than expected. (cf. #898)

Indeed, due to this and the previous report #898 I tried tumbler myself on mainnet, and I even deliberately chose to fund with only 1 utxo, and only saw one transaction out of 9 fail to work first time, but note I used a feerate substantially higher than 3 sat/vb, I used max_sweep_fee_change of 2 iirc, I saw once or twice "utxos too new" but that's normal and it just tries again. I also set maker_timeout_sec to 120 instead of 60 but I think that wasn't needed. The one that failed to work first time didn't manage to find 4 makers giving a correct response out of a required number somewhat higher, but it was OK next time.

Lobbelt commented 3 years ago

Could you provide some background as to why each of the attempts fail? I had the same issue, but it was related to the transaction fee being much lower than expected. (cf. #898)

Indeed, due to this and the previous report #898 I tried tumbler myself on mainnet, and I even deliberately chose to fund with only 1 utxo, and only saw one transaction out of 9 fail to work first time, but note I used a feerate substantially higher than 3 sat/vb, I used max_sweep_fee_change of 2 iirc, I saw once or twice "utxos too new" but that's normal and it just tries again. I also set maker_timeout_sec to 120 instead of 60 but I think that wasn't needed. The one that failed to work first time didn't manage to find 4 makers giving a correct response out of a required number somewhat higher, but it was OK next time.

Confirmed for me too - since changing max_sweep_fee_change to 2, the tumbler runs perfectly (only occasionally aborting due to other 'casual' issues).

AdamISZ commented 3 years ago

Without more details than descriptions like "I tried again and it didn't work", I can't really help. Would need a lot more detail, really, but that can be very difficult while maintaining privacy. So:

All 3 tests have failed with an error "Utxos that passed age and size limits, but have been used too many times" part way through the tumble process. It also states that these UTXOs cannot be used again.

This part is clear enough in itself, it's true that once you have utxos that have already passed that limit, there is no point trying again. You need new utxos, and especially new utxos in the mixdepth (so at least 1 (see my comment above - it only really needs 1) that you're spending from.

On the third test I started out with a new wallet but it still failed.

I can't help with that because no info.

MWICA commented 3 years ago

@AdamISZ @Lobbelt When I say I tried it 3 times, I did not try it 3 times after it failed. I tried it on 3 different tests with new UTXO on 3 different tumbling attempts. This 3rd attempt is on a newly generated JoinMarket wallet. I do not know if it is related to sweep fee in this case, I see nothing in the logs indicating any issues with the sweep fee I'm using. In this case, I do not mind giving up whatever information is needed to diagnose this since this was a wallet created for this test. The logs did not indicate what caused those UTXOs to be used too many times unless I am missing something, I can post log file if that is helpful. Thank you

MWICA commented 3 years ago

@AdamISZ @Lobbelt what information can I provide or look for to help determine the cause of this? Or if you have suggestions on what I should try I'll hear. Thanks

Lobbelt commented 3 years ago

Each of the 3 times should have had 3 attempts at tumbling with that UTXO abort because of some reason. What are those reasons - is one reason recurring a lot?

MWICA commented 3 years ago

@AdamISZ @Lobbelt I attached the full log file if any of you are curious to see it. Filtering for INFO might show more relevant info. Is it possible that makers just stop responding, or timeouts? I do see some things referencing "Stall detected". I also see an error of "Unacceptable feerate for sweep, giving up" but only see that once.

MWICA commented 3 years ago

I did a 4th test, with a completely newly generated wallet and loaded with UTXOs from a fresh source. This too failed with the error message ""Utxos that passed age and size limits, but have been used too many times". For this 4th test I had maker_timeout_sec set to to 120, max_sweep_fee_change set to 2, and tx_fee set to 3 (3 block confirmation target, the fees were not set low).

I am not sure if it is related or not, but in the logs of the new fail, I do see "Taker not continuing after receipt of orderbook" and "Stall detected."

The question is, does anyone actually use or test joinmarket? By that I mean the tumbler script, not the sendpayment script which is not suitable for mixing per the documentation i joinmarket. The joinmarket documentation itself says that the tumbler is what should be used. Does joinmarket work for anyone, if so what are you doing?

AdamISZ commented 3 years ago

The question is, does anyone actually use or test joinmarket? By that I mean the tumbler script, not the sendpayment script which is not suitable for mixing per the documentation i joinmarket.

Did you not see my report in this very thread? https://github.com/JoinMarket-Org/joinmarket-clientserver/issues/901#issuecomment-857483172 . @Lobbelt also reported using it successfully in the next comment.

It is of course not the only time I've used it, although naturally I use it only infrequently.

I also am testing tumbler all the time on regtest, but that's for checking general functionality without it being fully realistic (since many configuration details from mainnet are very hard to replicate on regtest).

AdamISZ commented 3 years ago

For this 4th test I had maker_timeout_sec set to to 120, max_sweep_fee_change set to 2, and tx_fee set to 3 (3 block confirmation target, the fees were not set low).

Yes, clearly you did read it :) But we are still in the dark as to what is causing a specific problem here. Let me look at that log file. Please don't be frustrated that I don't give you real time support, unfortunately few if any people are going to step in and do detailed analysis, and my current life situation involves me travelling around a lot so I can't respond to everything immediately.

AdamISZ commented 3 years ago

@MWICA I now have the log file, two points: (a) delete it from the public but (b) did you edit it in any way for privacy? because if you did I need to know (it's generally not something you want public for obvious reasons).

AdamISZ commented 3 years ago

Additional point: seeing "stall detected" is not really abnormal, and I think it's reasonable to expect it here and there. I think I saw it 2/3 times in my tumbler run from a few days ago. One cause is when more than a few counterparties give you a bad response or no response, and you fall below minimum_makers (4) and so your bot refuses to build the coinjoin. But there are others.

AdamISZ commented 3 years ago

So reading the log, here is an interesting sequence of events for the final stage of the first transaction (nothing private to you here):

2021-06-08 12:32:38,269 [MainThread  ] [DEBUG]  nick = J5_ID1 sent all sigs, removing from nonrespondant list
2021-06-08 12:42:54,213 [MainThread  ] [DEBUG]  Nick: J5_ID2 has left.
2021-06-08 12:49:34,989 [MainThread  ] [INFO ]  STALL MONITOR:
2021-06-08 12:49:34,993 [MainThread  ] [INFO ]  Stall detected. Retrying transaction if possible ...
2021-06-08 12:49:34,996 [MainThread  ] [INFO ]  Schedule entry: [0, 0, N, 'INTERNAL', t1, 16, 0] failed after timeout, trying again
2021-06-08 12:49:34,999 [MainThread  ] [INFO ]  Invalid transaction; possible mempool conflict.

The transaction you built had 8 equal outputs, so 7 not-you counterparties. There were 7 "..has sent all sigs" messages, meaning you got a full set of signatures. But note the times: you have 'sent all sigs' finishing at 12:32 then the stall monitor wakes up at 12:49 (a bit less than 20 minutes, this is expected), and finding no transaction has occurred. What should hvae happened is at least an attempt to broadcast the transaction, shortly after 12:32 (within a few seconds, normally, after your bot signs it). Can you check if you have a random-peer or not-self method set for tx_broadcast? It might explain why no tx broadcast happened.

MWICA commented 3 years ago

Please don't be frustrated that I don't give you real time support, unfortunately few if any people are going to step in and do detailed analysis, and my current life situation involves me travelling around a lot so I can't respond to everything immediately.

I am patient. You dont even owe anyone any support, you are doing everything for free. I understand that this is an open source project and that everyone contributes as they can. I am appreciative that you even took the time to read my posts and reply to me. I am hoping to get this solved, not just for me but also in case if this is an issue that may effect others as well.

MWICA commented 3 years ago

@MWICA I now have the log file, two points: (a) delete it from the public but (b) did you edit it in any way for privacy? because if you did I need to know (it's generally not something you want public for obvious reasons).

a) I can delete it if you want me to, but I dont mind keeping it posted. It was a wallet I created only for testing purposes and have not used for anything else, and I do not even have that wallet anymore. Any UTXO from it I have, I am mixing with other ways in the time being. b) I did not edit it in anyway,

edit: I deleted the log file as requested but I still have a copy if it is needed by someone.

MWICA commented 3 years ago

Can you check if you have a random-peer or not-self method set for tx_broadcast?

In the config file, tx_broadcast is set to the default of tx_broadcast = self

MWICA commented 3 years ago

I am going to search through my bitcoin node debug log to see if I can find anything about it trying to broadcast transactions. @AdamISZ do you think I should try another test with a different setting for tx_broadcast? Thank you

MWICA commented 3 years ago

The new default is random-peer with fallback to self if no broadcast made by other peer.

Interesting. How long has that been the default? It looks like the version I have was cloned from github from around May 20-23 Edit: I see that became default like June 5th I think.

Not sure if any recent updates have changed anything that might resolve the issues I'm having, though I can update later on today so if I test again joinmarket will be up to date with latest

PulpCattel commented 3 years ago

The transaction you built had 8 equal outputs, so 7 not-you counterparties. There were 7 "..has sent all sigs" messages, meaning you got a full set of signatures.

Shouldn't we find all makers have sent their signatures and INFO:Transaction is valid, signing.. after the last sent all sigs? Why aren't there? It seems to me we didn't reach the broadcast phase at all.

The last 3 CoinJoin attempt failed for:

  1. Not enough counterparties, so the makers your bot chose, for various reasons, weren't eligible or they didn't respond.

  2. Transaction fee for sweep: xxx too far from expected, which means you were overpaying, which is understandable since only 4 out of 6 chosen makers responded, and therefore the bot aborted (this would have been solved by setting a very high max_sweep_fee_change).

  3. Out of 5 chosen makers 4 didn't respond.

These were 3 tries, after those your commitment is no longer valid.

AdamISZ commented 3 years ago

Not enough counterparties, so the makers your bot chose, for various reasons, weren't eligible or they didn't respond.

Ah yes, very good catch (and indeed my first thought was clearly a wrong guess anyway, since tx_broadcast was apparently default). So this one is still to be addressed (i.e. what exactly went wrong in the 1st transaction).

I think there's still more to be said here though, sorry have to be AFK.

AdamISZ commented 3 years ago

I continue to find the first transaction curious/intriguing. Here's the data I see:

starts by requesting 9 parties (so if full, would be a 10 party coinjoin). Fee estimate: 31 inputs and 19 outputs, using block target 3 (tx_fees = 3). 31 is 4 taker utxos plus 3x9 estimate from makers; similarly for outputs. Nothing even slightly strange so far. Requests offers, chooses 9 as intended. The fee rates offered by these parties chosen suggests nothing unusual to me; maybe you have stuck with the default (randomized) allowed fee range (see max_cj_fee_abs/rel). A PoDLE is chosen normally and correctly and the 9 makers respond with their proposed input sets. 2 of those 9 have sent you inputs totalling significantly less than the coinjoin amount, resulting in negative change of -6955694 and -1054912. The code obviously rejects this. I want to check if there is some scenario that causes maker bots to do this unintentionally, but, that's a side story. For now we just see that those 2 are rejected and we proceed with the remaining 7.

So a transaction is built for those 7 (8 equal outs incl. taker), and sent to them. They return sigs. Notice "sent all sigs" is seen 7 times, which means we should at that point drop into this code:

https://github.com/JoinMarket-Org/joinmarket-clientserver/blob/29fe25282c47f5f15ded6145af4c013437f9d342/jmclient/jmclient/taker.py#L689-L701

Since we specifically don't see the log INFO message on line 698, as pointed out by @PulpCattel , we have to conclude that tx_signed evaluated to False (the other possibility is an assertion fail, which would crash, but that is not consistent with the fact that the tumbler kept running). That seems to suggest that we accepted an input as signed, but it was not signed, or signed differently somehow. I think this needs more investigation.

undeath commented 3 years ago

As @AdamISZ pointed out the first transaction is weird because of the negative change outputs. This does indeed cause problems.

After receiving and verifying ioauth responses, inputs are added early to self.utxos in a very huge loop with multiple continue aborts but only in the first error case inputs that failed further verification are actually removed again from self.utxos.

see here: https://github.com/JoinMarket-Org/joinmarket-clientserver/blob/29fe25282c47f5f15ded6145af4c013437f9d342/jmclient/jmclient/taker.py#L395-L398

self.utxos is later used to construct self.latest_tx, still including the inputs that failed verification: https://github.com/JoinMarket-Org/joinmarket-clientserver/blob/29fe25282c47f5f15ded6145af4c013437f9d342/jmclient/jmclient/taker.py#L537 https://github.com/JoinMarket-Org/joinmarket-clientserver/blob/29fe25282c47f5f15ded6145af4c013437f9d342/jmclient/jmclient/taker.py#L551-L552

Later, self.latest_tx is used to check if all required signatures have been received, which of course fails and eventually times out. https://github.com/JoinMarket-Org/joinmarket-clientserver/blob/29fe25282c47f5f15ded6145af4c013437f9d342/jmclient/jmclient/taker.py#L690-L694

I'll try to clean this up a bit.

AdamISZ commented 3 years ago

@undeath thanks. I was going to spend a bit of time reading code today, but I think you got to the point way quicker than I would have!

It's pretty surprising, but I think this issue has existed even since the earlier version of the project, e.g. see this late-2016 version of original Joinmarket codebase:

https://github.com/JoinMarket-Org/joinmarket/blob/72537e24464e76aac55f3d690a49797139c49d52/joinmarket/taker.py#L149

Note that in that version, recv_txio was called directly every time an ioauth was received from a counterparty (in contrast with current codebase where jmdaemon waits to send the data from all makers at the end; I think it's not really functionally different but obviously that context is needed for analysis). Just as in the current codebase, self.utxos[nick] is updated at the start (see line 155 in the above), and there are several places where return occurs after the update, without deleting that cp's utxos. I could be wrong but it certainly seems that way.

Fwiw, while there is more than one way to address this, it might make sense to add the utxos on a per-nick basis, given the list of nicks that passed all the verification, rather than a "remove when bad" paradigm. But either way, as you see is better.

I still need to investigate (a) if there is an obvious snafu causing negative change, because while it could be a malicious actor, I'd count that far less likely (especially considering it completely fails), than a bug or perhaps some configuration edge case. Also the non-zero change of the sweep, which may be connected.

PulpCattel commented 3 years ago

Do yourself a favor and save you the time, my observations show are not malicious but incompetent maker bots. Which advertised a higher MaxSize, of them larger in total of their UTXOS could even be available at the time. Therefore, the totalinput was too low, lower than cjamount and the calculation gave negative values. Ignore it.

Where are the observations, or even better proofs? There could be various other reasons, I don't understand why we shouldn't look deeper, it's a common enough thing that surely deserve some investigation.

That being said, it could (if there is a bug) be naive makers that spend from the wallet while YG is running, leaving the wallet with less than advertised max size. Or maybe multiple bots running with the same wallet, so when one does a CoinJoin, the other bots max size may become suddenly wrong, etc...

That's it. The reason for try#1 failed.

Just to be clear, the first try that leads to the invalid commitment does not fail for that, the first transaction in the log fails for that, but that transaction does not matter as it is followed by a successful one.

PulpCattel commented 3 years ago

The log showed, the both makers responded with an utxo/utxos which input amount that was <= cjamount. just result negative change. It's fault of the maker runnning the 2 bots.

This should be prevented by the maker bot, it filters mixdepths with balance bigger or equal than the required amount. It should not fill an order if it cannot find valid UTXOs.

AdamISZ commented 3 years ago

@bisqubutor as i understood, @undeath is going to look at PR-ing a fix bearing in mind there are a few places where this error crops up in that for loop (or, well, certainly no harm in you patching as you wish, you certainly have the right basic idea I think).

AdamISZ commented 3 years ago

(Edit: oh, I see, @PulpCattel already answered this point. Sorry I got confused about who was who in the conversation. Anyway, I will leave this comment in case it helps someone.)

Do yourself a favor and save you the time, my observations show, are not malicious but incompetent maker bots. Which advertised a higher MaxSize, of them larger in total of their UTXOS could even be available at the time. Therefore, the totalinput was too low, lower than cjamount and the calculation gave negative values. Ignore it.

But wait, I don't understand the hypothesis here. Unless code is edited, a maker will not fill an offer (order) if it doesn't have enough in any mixdepth:

https://github.com/JoinMarket-Org/joinmarket-clientserver/blob/027e46852dc1b9b3ee83d4d2c703600d10119a4a/jmclient/jmclient/yieldgenerator.py#L115-L143

(I mean clearly it wasn't intentional that makers would choose an arbitrary maxsize and then advertise and attempt to fill based on that, it's supposed to be calculated from mixdepth balances).

AdamISZ commented 3 years ago

That being said, it could (if there is a bug) be naive makers that spend from the wallet while YG is running, leaving the wallet with less than advertised max size.

That's an interesting thought but I think cannot quite be the reason here. (1) is trivial - there is a wallet lock for spending, but a person could get round that by just deleting the lock file, and I could easily imagine someone choosing to do that (because they don't want to restart the bot, say). But (2) - while, when a new spend-out transaction occurs, the yieldgen bot does not automatically update its offer, nevertheless when it receives a !fill from a taker, the above code executes (oid_to_order()), and that queries self.get_available_mixdepths() which calls get_balance_by_mixdepth() at that time, which is a real time query of the wallet; then the mixdepths without enough are filtered out. So at the point it tries to fill the offer, it should fail if the balances are all too small. In this case it just wouldn't send out !ioauth and the taker would continue without it.

(Edit: adding this as a parenthetical. After #359 the wallet is "reactive" and updates itself whenever there's a spend in or out, in real time. The thing this doesn't yet do is update the maker bot's offers at the time that spend in/out happens. It only updates the public offer after a transaction when the offer is recalculated (much like the above). Somewhat overdue PR: the yieldgen has a callback hook that updates its offer at the time of a transaction (should be easy with WalletService.register_callbacks, at least fairly easy). But it's more just a minor inefficiency really. The bot can be restarted.)

PulpCattel commented 3 years ago

Yes exactly, that's why I added (if there is a bug), because the maker bot should prevent that but maybe isn't? That part of the code (i.e. oid_to_order() and filtered_mix_balance=) seems pretty straightforward though, IDK.

It seems to me the negative change case and the already spent or unconfirmed UTXO case are similar in this (i.e., they are either a bug somewhere or a tweaked bot), right? A maker filters for at least 1 confirmation, so it should never fill an offer with an unconfirmed UTXO (yet alone already spent, which seems even more weird).

AdamISZ commented 3 years ago

Yes, I think we're on the same page there. Sorry for ignoring your parenthetical, clearly that adds up.

so it should never fill an offer with an unconfirmed UTXO

Yes the maker only chooses minconfs=1, but : the unconfirmed status of a utxo is, after all, not a state in perfect global synchrony so it's possible the taker sees it as unconf when the maker sees it as 1 conf. But not sure if that's relevant to this case.

PulpCattel commented 3 years ago

Yes the maker only chooses minconfs=1, but : the unconfirmed status of a utxo is, after all, not a state in perfect global synchrony so it's possible the taker sees it as unconf when the maker sees it as 1 conf.

Oh right, I didn't think about that, but wait doesn't that imply the taker (and specifically the taker) has an out of sync BItcoin Core? The Bitcoin network and the UTXO set should be quite nicely in sync otherwise. The case where, during a CoinJoin negotiation, a maker receives a block faster than the taker seems incredibly rare.

But not sure if that's relevant to this case.

Well, this appears multiple times in OP's log and it's one of the reason his tumbler got stuck.

AdamISZ commented 3 years ago

I have seen 20-30 sec delay between a block arriving at one node and at another (in the past. I don't vouch for this happening frequently at all. I remember someone pointing out to me that this happens, and being surprised .. maybe it was Luke-Jr. Then i saw it myself.). Certainly not claiming it's a likely cause ... the timing would have to be unlucky ... but it just struck me as a possibility.

PulpCattel commented 3 years ago

But not only it has to happen, it has to be the exact block that confirms the maker UTXO.

AdamISZ commented 3 years ago

Yep; "the timing would have to be unlucky".

PulpCattel commented 3 years ago

Unlucky kinda like being struck by lightning :D

But yeah, I'm stressing this a little bit because I'm really curious myself.

AdamISZ commented 3 years ago

Here is my summary of what I read in the log. First, a key:

  1. (F) (C U) tx had utxos not from valid makers, this is the bug discussed above and a fix is in #909
  2. (S) (C U)
  3. (F) (C N) - waiting for 5 confirmations (this is normal)
  4. As 3.
  5. (F) (C U) - exactly the same scenario as 1.
  6. (F) (C U) IRC connection was lost at the end of the negotiation while gathering signatures. We then retry with the same 5 makers as we got sigs from already, in 7:
  7. (S) (C U)
  8. As 3.
  9. (F) (C U) Not enough counterparties responded; 4 of 7 sent back ioauths in time. It looks like 2/4 had invalid utxos.
  10. (F) (C U) sweep fee too far from expected. See #908 for a partial fix here.
  11. (F) (C U) A bit like 9 but more extreme: of 5 counterparties requested, only ~4~ 1 responded.
  12. (F) (C C) commitments can no longer be sourced, shut down.

I think the following is important: at 12. , the wallet had 2 utxos, but it gave up because only 1 of them was in the same mixdepth. At least, this is my understanding, and it means the code running was post- the merge of #840 if I got that right.

PulpCattel commented 3 years ago
  1. (F) (C U) A bit like 9 but more extreme: of 5 counterparties requested, only 4 responded.

Do you mean only 1 responded? Or I completely misread the log, I don't have the log anymore to check but could definitely be, in case I'll edit my previous answer.

At least, this is my understanding, and it means the code running was post- the merge of #840 if I got that right.

OP said in this comment he cloned from GitHub around end of May, which is indeed after the merge of #840.

AdamISZ commented 3 years ago

Yes, only 1, edited.

MWICA commented 3 years ago

For the 3 reasons mentioned by PulpCattel, reason number 2 "Transaction fee for sweep: xxx too far from expected," I fixed this by setting as max_fee_sweep to 2 for my 4th test, when estimated fees are low the max_fee_sweep should be higher than the default 80% 0.8

For the other reasons you all think that the tumble failed, I still have the logs from my 4th test. What can I look for in those logs to see if this same thing occurred on test 4?

MWICA commented 3 years ago

With version 0.8.3 I am able to use joinmarket without issues.