Open PulpCattel opened 1 year ago
Just dropping a note on what I've been seeing for the last few days since #1436 merge: though we back off (with that code ofc, most users are on 0.9.8), we still seem to be basically permanently shut off from two of the three directory-nodes. More work required, even though that patch did have the desired effect, and bots can actually run normally without log spam.
Re: "more work", I think a good place to focus on is how the directory node onion services are configured. Perhaps somewhere buried there's a setting that can allow more connections? Or at least, figure out what the connection limits are (both in timing of requests, and total number, say)?
Just focusing about Tor, this could be a start https://community.torproject.org/onion-services/advanced/dos/
For more crude details we might have to dig Tor source code and spec at some point.
Was reading docs for a related project, and noticed this: https://tunnelsats.github.io/tunnelsats/FAQ.html#tuning-tor ... perhaps worth investigating? I believe someone mentioned LongLivedPorts on this repo a while ago. Edit: yes, see https://github.com/JoinMarket-Org/joinmarket-clientserver/commit/fbcb9fd377f805fdbc59cbad8e1faaa7de9eb495 - hence 5222.
Interesting point from man tor
: ConnLimit
defaults to 1000. Second, I observed earlier today that about 1000 !orderbook
messages were propagated by my dir node. Third, the way we use tor is that a separate instance of the tor process is run for each directory node.
Though we see only perhaps a couple of hundred makers (at best), we could very easily be supporting 1000 takers+makers+obwatchers. I'll look into changing this. (though, i still don't actually understand why it would be 'connect then disconnect' and not just 'refuse', if indeed this is in any way related).
Edit: hmm, on a second read, unfortunately that appears to be a minimum not a maximum. So probably irrelevant but may as well leave the note here.
In case anyone else is investigating this like me, here's a point that might be very important: how to see detailed logs of the tor instance in which your directory node is running:
Note that we spawn a new tor process for it, and that process can be set to log to a particular file by editing path-to-jmcs/jmvenv/lib/python3.xx/site-packages/txtorcon/controller.py
line 314 by extending this list:
config_args = ['-f', '/dev/null/non-existant-on-purpose', '--ignore-missing-torrc']
to something like:
config_args = ['-f', '/dev/null/non-existant-on-purpose', '--ignore-missing-torrc', 'Log', 'info file yourfilename.txt']
(read man tor
, but notice that while on command line you need double quotes around the multi word value for the Log
option, here, you don't need th edouble quotes). Another thing is that the location of the log file created is in /tmp/tortmpxxxxx
where the random xxxxx
can be seen from ps aux | grep tor
or similar.
Log levels here are debug, info, warn, notice or similar as per docs/man page. I think info is already pretty verbose, debug looks overwhelming. I'm going to be tail -f-ing this to help me see what transpires with different setups.
(You can also control the output of journalctl
with options in the .service
file, like StandardOutput:journal+console
but it doesn't help you with the tor log, hence the above).
Please let me know if you have other info about logging tor, here.
An addendum: when tail
-ing that output, it was still overwhelming (which itself may be interesting!) for my mainnet dnode, so you may find notice
more realistic than info
(as, in fact, the man page suggests). Let's see.
Seeing this pretty often still when launching bots. Curious why sometimes they will reconnect for very short periods of time.
2023-03-xxxxxxxxxxxxxxx [WARNING] Failure to send message to directory: g3hv4uynnmynqqq2mchf3fcm3yd46kfzmcdogejuckgwknwyq5ya6iad.onion:5222.
2023-03-xxxxxxxxxxxxxxx [WARNING] Failure to send message to directory: 3kxw6lf5vf6y26emzwgibzhrzhmhqiw6ekrek3nqfjjmhwznb2moonad.onion:5222.
2023-03-xxxxxxxxxxxxxxx [WARNING] Failure to send message to directory: bqlpq6ak24mwvuixixitift4yu42nxchlilrcqwk2ugn45tdclg42qid.onion:5222.
So, I decided to should start my own directory node to dig into why. Tor is quite a bit more sophisticated than I am, so it was a lot of trial and error to even get going. Now that it's working, I guess it's time to try to break it.
If anyone wants to connect, here is the list of directory_nodes I am currently using for reference. This includes the newest from the stopgap list of directory nodes, #1445, to date of this comment, as well as the new node from https://github.com/JoinMarket-Org/joinmarket-clientserver/issues/1456.
directory_nodes = plq5jw5hqke6werrc5duvgwbuczrg4mphlqsqbzmdfwxwkm2ncdzxeqd.onion:5222,g3hv4uynnmynqqq2mchf3fcm3yd46kfzmcdogejuckgwknwyq5ya6iad.onion:5222,3kxw6lf5vf6y26emzwgibzhrzhmhqiw6ekrek3nqfjjmhwznb2moonad.onion:5222,bqlpq6ak24mwvuixixitift4yu42nxchlilrcqwk2ugn45tdclg42qid.onion:5222,ylegp63psfqh3zk2huckf2xth6dxvh2z364ykjfmvsoze6tkfjceq7qd.onion:5222
From what I can see, g3hv4uynnmynqqq2mchf3fcm3yd46kfzmcdogejuckgwknwyq5ya6iad
is "down" too now, same pattern as the other directories. RIP.
It really seems like there's a magic threshold that we reach and things stop working.
2023-03-27 xxxxxxxxxxxx [WARNING] Failure to send message to directory: g3hv4uynnmynqqq2mchf3fcm3yd46kfzmcdogejuckgwknwyq5ya6iad.onion:5222.
2023-03-27 xxxxxxxxxxxx [WARNING] Failure to send message to directory: 3kxw6lf5vf6y26emzwgibzhrzhmhqiw6ekrek3nqfjjmhwznb2moonad.onion:5222.
2023-03-27 xxxxxxxxxxxx [WARNING] Failure to send message to directory: bqlpq6ak24mwvuixixitift4yu42nxchlilrcqwk2ugn45tdclg42qid.onion:5222.
Seeing the same thing here still. I've been monitoring the tor logs for the directory node I setup, but nothing really unusual has presented itself yet, and it still hasn't been hit by the "magic threshold" as far as I can tell. Please report if you have issues with "wkd3kd73ze62sqhlj2ebwe6fxqvshw5sya6nkvrgcweegt7ljhuspaid.onion"* and I can reference back to logs more directly possibly.
edit: The "wkd3" onion referenced above has been replaced by plq5jw5hqke6werrc5duvgwbuczrg4mphlqsqbzmdfwxwkm2ncdzxeqd.onion due to unrelated server downtime.
EDIT: as per https://github.com/JoinMarket-Org/joinmarket-clientserver/pull/1436#issuecomment-1399480855, the spam issue should be mostly solved. Renamed this issue to focus on the "disconnect" part. Hopefully the changes in #1436 will improve this already.
EDIT:
g3hv4uynnmynqqq2mchf3fcm3yd46kfzmcdogejuckgwknwyq5ya6iad
is constantly disconnecting too now.As reported by many different users and contributors, the following two directory nodes are disconnecting in a repetitive/constant pattern:
3kxw6lf5vf6y26emzwgibzhrzhmhqiw6ekrek3nqfjjmhwznb2moonad.onion:5222
bqlpq6ak24mwvuixixitift4yu42nxchlilrcqwk2ugn45tdclg42qid.onion:5222
The log is simply a constant repetition of:
It started last week from what I can gather from users report, but I can't confirm it wasn't happening earlier too. At this point, it seems plausible to me this is a bug in our code logic.
The simplest explanation that comes to my mind is that we are not handling correctly some connectivity issue edge case.
Curiously, the newly added directory node
g3hv4uynnmynqqq2mchf3fcm3yd46kfzmcdogejuckgwknwyq5ya6iad.onion:5222
is unaffected. This node is not yet in a release, so it might simply be because of less network traffic (EDIT: this directory is now experiencing exactly the same problem).FWIW, signet directories also appear to be working.
This issue might resolve itself, but IMHO it's good to look into it to prevent it from happening again.