Closed hosiawak closed 3 years ago
With a proxy, a good amount of code in the clboss gets involved (pulls in code in Net/
directory), so a small snippet of code would not be so small. Maybe I can add a dev binary that performs the connection check directly though.
I commented out the tor proxy line and the tor announce addr and the status is now reporting online.
I also noticed that when clboss was running on tor proxy (using the config above) it didn't open or rebalance any channels. It went through of ChannelFinderByPopularity/ChannelFinderByDistance: Dijkstra progress and XXX-channeld-chan#46: Peer connection lost/Peer transient failure in CHANNELD_NORMAL: channeld: Owning subdaemon channeld died
over and over but didn't manage to do any actual work.
It looks like a possible bug with the tor proxy. I still have the logs if you need them.
Running for a few hours with the proxy turned off and low fees. This thing is incredible to say the least. Opened ~ 20 channels, most of them are already relatively balanced, rebalancing doesn't seem to cost an arm and a leg either. Maybe I wasn't using the right tools on LND but before I never managed to maintain a node with the same ease, speed and low cost (like a boss ) :) Thank you 🥇
Glad you liked it. FWIW the node somebody lent me, that also has a Tor proxy, is also getting "not online" a good amount of the time, maybe I have a bug somewhere, hmm.
I just pushed a new dev tool, dev-proxy-connect
, on master. On my system, with a Tor proxy:
zmnscpxj@developer:~/dvpt/clboss$ time ./dev-proxy-connect 127.0.0.1 9050 www.google.com 443
Not connected: Network is down (100)
real 0m0.006s
user 0m0.000s
sys 0m0.005s
zmnscpxj@developer:~/dvpt/clboss$ time ./dev-proxy-connect 127.0.0.1 9050 www.google.com 443
Connected.
real 0m0.495s
user 0m0.006s
sys 0m0.000s
zmnscpxj@developer:~/dvpt/clboss$ time ./dev-proxy-connect 127.0.0.1 9050 www.google.com 443
Connected.
real 0m0.372s
user 0m0.006s
sys 0m0.000s
zmnscpxj@developer:~/dvpt/clboss$ time ./dev-proxy-connect 127.0.0.1 9050 www.google.com 443
Not connected: Network is down (100)
real 0m0.007s
user 0m0.007s
sys 0m0.000s
zmnscpxj@developer:~/dvpt/clboss$ time ./dev-proxy-connect 127.0.0.1 9050 www.google.com 443
Connected.
real 0m0.328s
user 0m0.001s
sys 0m0.006s
zmnscpxj@developer:~/dvpt/clboss$ time ./dev-proxy-connect 127.0.0.1 9050 www.google.com 443
Connected.
real 0m1.483s
user 0m0.003s
sys 0m0.003s
Notice how the "not connected" is very fast, in 0.006s, whereas "connected" judgment is a few hundred milliseconds.
The 100
is the errno
, and looking it up in the Linux errno
file shows it is ENETDOWN
.
I then traced the command over and over until I could get something strange, and came upon this part of my code: https://github.com/ZmnSCPxj/clboss/blob/971beadaa1ff2248c33e589e567fa3259123fbd8/Net/ProxyConnector.cpp#L85-L97
See the part where it transforms the SOCKS5 0x01
error to ENETDOWN
? I added a printf
there, which showed that the "Not connected: Networks is down (100)" passed through that branch. This to_error
function translates the SOCKS5 error to the nearest equivalent errno
.
The SOCKS5 protocol shows this: https://tools.ietf.org/html/rfc1928
o REP Reply field:
o X'00' succeeded
o X'01' general SOCKS server failure
o X'02' connection not allowed by ruleset
o X'03' Network unreachable
o X'04' Host unreachable
o X'05' Connection refused
o X'06' TTL expired
o X'07' Command not supported
o X'08' Address type not supported
o X'09' to X'FF' unassigned
So it looks to me that for no good reason, Tor returns 0x01 at random times, leading us to conclude that we are not connected to the Internet.
Duckduckgo on "tor general socks server failure" leads to this: https://tor.stackexchange.com/questions/8668/proxy-client-unable-to-connect-general-socks-server-failure which seems to suggest that there is some failure in the ISP level.
In any case, I tried again until I could get a "not connected", then checked my syslogs:
zmnscpxj@developer:~/dvpt/clboss$ date; ./dev-proxy-connect 127.0.0.1 9050 www.google.com 443
Fri 13 Nov 2020 11:42:34 AM PST
Not connected: Network is down (100)
And syslog:
Nov 13 11:42:35 pop-os Tor[1043]: Have tried resolving or connecting to address '[scrubbed]' at 3 different places. Giving up.
Unfortunately I do not have detailed Tor logs, so I have to go enable them and restart my Tor service.
Enabling Tor logs does not show me anything more detailed than "Have tried resolving or connecting to address '[scrubbed]' at 3 different places. Giving up." anyway. Hmm.
Looks like my workarounds here are:
always-use-proxy
, do not use the proxy in the Internet connection monitoring.In the proxy connector, if we get a SOCKS5 0x01 error, try try again.
Turns out not to help much (>_<#)
I tried modifying it so that it makes up to 3 attempts. In the below tries
is the number of failures before we returned:
zmnscpxj@developer:~/dvpt/clboss$ ./dev-proxy-connect 127.0.0.1 9050 www.google.com 443
tries: 0
Connected.
zmnscpxj@developer:~/dvpt/clboss$ ./dev-proxy-connect 127.0.0.1 9050 www.google.com 443
tries: 0
Connected.
zmnscpxj@developer:~/dvpt/clboss$ ./dev-proxy-connect 127.0.0.1 9050 www.google.com 443
tries: 1
Connected.
zmnscpxj@developer:~/dvpt/clboss$ ./dev-proxy-connect 127.0.0.1 9050 www.google.com 443
tries: 3
Not connected: Network is down (100)
zmnscpxj@developer:~/dvpt/clboss$ ./dev-proxy-connect 127.0.0.1 9050 www.google.com 443
tries: 0
Connected.
zmnscpxj@developer:~/dvpt/clboss$ ./dev-proxy-connect 127.0.0.1 9050 www.google.com 443
tries: 0
Connected.
zmnscpxj@developer:~/dvpt/clboss$ ./dev-proxy-connect 127.0.0.1 9050 www.google.com 443
tries: 0
Connected.
zmnscpxj@developer:~/dvpt/clboss$ ./dev-proxy-connect 127.0.0.1 9050 www.google.com 443
tries: 0
Connected.
zmnscpxj@developer:~/dvpt/clboss$ ./dev-proxy-connect 127.0.0.1 9050 www.google.com 443
tries: 3
Not connected: Network is down (100)
zmnscpxj@developer:~/dvpt/clboss$ ./dev-proxy-connect 127.0.0.1 9050 www.google.com 443
tries: 0
Connected.
zmnscpxj@developer:~/dvpt/clboss$ ./dev-proxy-connect 127.0.0.1 9050 www.google.com 443
tries: 0
Connected.
zmnscpxj@developer:~/dvpt/clboss$ ./dev-proxy-connect 127.0.0.1 9050 www.google.com 443
tries: 0
Connected.
zmnscpxj@developer:~/dvpt/clboss$ ./dev-proxy-connect 127.0.0.1 9050 www.google.com 443
tries: 0
Connected.
zmnscpxj@developer:~/dvpt/clboss$ ./dev-proxy-connect 127.0.0.1 9050 www.google.com 443
tries: 3
Not connected: Network is down (100)
zmnscpxj@developer:~/dvpt/clboss$ ./dev-proxy-connect 127.0.0.1 9050 www.google.com 443
tries: 0
Connected.
zmnscpxj@developer:~/dvpt/clboss$ ./dev-proxy-connect 127.0.0.1 9050 www.google.com 443
tries: 0
Connected.
zmnscpxj@developer:~/dvpt/clboss$ ./dev-proxy-connect 127.0.0.1 9050 www.google.com 443
tries: 0
Connected.
zmnscpxj@developer:~/dvpt/clboss$ ./dev-proxy-connect 127.0.0.1 9050 www.google.com 443
tries: 3
Not connected: Network is down (100)
zmnscpxj@developer:~/dvpt/clboss$ ./dev-proxy-connect 127.0.0.1 9050 www.google.com 443
tries: 0
Connected.
zmnscpxj@developer:~/dvpt/clboss$ ./dev-proxy-connect 127.0.0.1 9050 www.google.com 443
tries: 0
Connected.
zmnscpxj@developer:~/dvpt/clboss$ ./dev-proxy-connect 127.0.0.1 9050 www.google.com 443
tries: 0
Connected.
zmnscpxj@developer:~/dvpt/clboss$ ./dev-proxy-connect 127.0.0.1 9050 www.google.com 443
tries: 3
Not connected: Network is down (100)
zmnscpxj@developer:~/dvpt/clboss$ ./dev-proxy-connect 127.0.0.1 9050 www.google.com 443
tries: 0
Connected.
zmnscpxj@developer:~/dvpt/clboss$ ./dev-proxy-connect 127.0.0.1 9050 www.google.com 443
tries: 0
Connected.
zmnscpxj@developer:~/dvpt/clboss$ ./dev-proxy-connect 127.0.0.1 9050 www.google.com 443
tries: 0
Connected.
Out of several dozen tries, many of the failures ended up repeating pointlessly anyway. Most of the above "Connected" succeeded on the first try, and there still exist "Not connected" that persisted even after 3 tries in quick succession. Hmmm.
I tried adding a 100-millisecond delay between each try, but it looks like Tor is bursty when it fails.... sigh.
zmnscpxj@developer:~/dvpt/clboss$ time ./dev-proxy-connect 127.0.0.1 9050 www.google.com 443
tries: 3
Not connected: Network is down (100)
real 0m0.312s
user 0m0.000s
sys 0m0.009s
Hi @hosiawak, I did the workaround where, if you set proxy
but not always-use-proxy
, CLBOSS will not use the proxy and instead connect directly. Can you please try checking this if it works for you with a proxy
setting? clboss-ca27f303cfa9fc18e51d72d6164d2942da9a5760.tar.gz
My own node with both a local Tor proxy and always-use-proxy
seems to have no problem connecting to the servers via the proxy. The other node is also now okay with its InternetConnectionMonitor
, mostly. Maybe just a recent attack on Tor?
5 minutes after restarting c-lightningd:
"internet": {
"connection": "offline",
"checking_connectivity": true
},
wut.
Do you have debug-level logs log-level=debug:clboss
?
No, just realized my code was wrong and it was considering "always-use-proxy" if the key existed in the listconfigs
instead of checking its true
-ness. The key always exists in the listconfigs
, LOL.
@hosiawak sorry for the trouble, here is the updated tarball: clboss-91bbbf93720784fd56c1633fa401e2b497fd83be.tar.gz, could you please check?
@ZmnSCPxj Now, with "proxy" set but "always-use-proxy" commented out it's online. With both set it's offline.
Is the connection status used internally by clboss (to initiate channel openings for example) or just for clboss-status
output ?
It is used in particular to prevent reducing the connect_rate
metric when we are offline --- if we believe we are offline, we do not record the connect_rate
for any peers. We do not attempt to probe channel candidates if they are online, since we think we are offline and cannot probe anything. There are probably other processes I should suppress if we are offline as well, such as active probing.
In any case, on the system where you have issues, can you check how often this succeeds?
torify wget https://www.google.com/index.html
@ZmnSCPxj Ran it 10x, it succeeded 10x.
# time torify wget https://www.google.com/index.html
--2020-11-14 02:11:38-- https://www.google.com/index.html
Resolving www.google.com (www.google.com)... 142.250.74.228
Connecting to www.google.com (www.google.com)|142.250.74.228|:443... connected.
HTTP request sent, awaiting response... 200 OK
Length: unspecified [text/html]
Saving to: ‘index.html.10’
index.html.10 [ <=> ] 12.57K 14.9KB/s in 0.8s
2020-11-14 02:11:41 (14.9 KB/s) - ‘index.html.10’ saved [12870]
0.027u 0.006s 0:02.55 0.7% 1124+406k 17+10io 0pf+0w
It takes between 3 and 10 seconds. Sometimes Resolving
takes a few seconds. Maybe there's a low timeout somewhere ?
Okay, so I decided to strace
the torify wget https://www.google.com/index.html
command, and here is the bit during the resolving:
write(2, "Resolving www.google.com (www.go"..., 45Resolving www.google.com (www.google.com)... ) = 45
socket(AF_INET, SOCK_STREAM, IPPROTO_TCP) = 3
connect(3, {sa_family=AF_INET, sin_port=htons(9050), sin_addr=inet_addr("127.0.0.1")}, 16) = 0
sendto(3, "\5\1\0", 3, 0, NULL, 0) = 3
recvfrom(3, "\5\0", 2, 0, NULL, NULL) = 2
sendto(3, "\5\360\0\3\16www.google.com\0*", 21, 0, NULL, 0) = 21
recvfrom(3, "\5\0\0\1", 4, 0, NULL, NULL) = 4
recvfrom(3, "\254\331\243\344", 4, 0, NULL, NULL) = 4
close(3) = 0
ioctl(0, TIOCGPGRP, [330878]) = 0
getpgrp() = 330878
write(2, "172.217.163.228", 15172.217.163.228) = 15
In particular, it seems to be a separate SOCKS5 proxy request to resolve the www.google.com
hostname, which it then connects to later here:
write(2, "Connecting to www.google.com (ww"..., 70Connecting to www.google.com (www.google.com)|172.217.163.228|:443... ) = 70
socket(AF_INET, SOCK_STREAM, IPPROTO_IP) = 3
getsockopt(3, SOL_SOCKET, SO_TYPE, [1], [4]) = 0
connect(3, {sa_family=AF_INET, sin_port=htons(9050), sin_addr=inet_addr("127.0.0.1")}, 16) = 0
sendto(3, "\5\1\0", 3, 0, NULL, 0) = 3
recvfrom(3, "\5\0", 2, 0, NULL, NULL) = 2
sendto(3, "\5\1\0\3\017172.217.163.228\1\273", 22, 0, NULL, 0) = 22
recvfrom(3, "\5\0\0\1\0\0\0\0\0\0", 10, 0, NULL, NULL) = 10
So let us go through the first ptrace
fragment way above. This sets up the socket to the proxy:
socket(AF_INET, SOCK_STREAM, IPPROTO_TCP) = 3
connect(3, {sa_family=AF_INET, sin_port=htons(9050), sin_addr=inet_addr("127.0.0.1")}, 16) = 0
The torify wget
sends it the standard SOCKS5 opening with the "unauthenticated" authentication method:
sendto(3, "\5\1\0", 3, 0, NULL, 0) = 3
This is the same that is loaded and written by CLBOSS code:
In our CLBOSS code, we write the three bytes 5, 1, and 0, aka "\5\1\0"
in the torify wget
run.
The proxy responds:
recvfrom(3, "\5\0", 2, 0, NULL, NULL) = 2
In the CLBOSS code, that is also what we expect, a 0x05 and a 0x00, aka "\5\0"
in the torify wget
trace:
Then things get interesting here:
sendto(3, "\5\360\0\3\16www.google.com\0*", 21, 0, NULL, 0) = 21
In CLBOSS we use this code:
CLBOSS writes four bytes, but note the second byte. CLBOSS writse 0x05 0x01 0x00 0x03 (for something like "www.google.com" that is judged as type Name
thus the fourth byte is 0x03). The second byte is a command --- with CLBOSS, we just connect directly to the hostname, which is the CONNECT
command or 0x01 in the second byte.
In the torify wget
run this byte is "\360"
, aka 0xF0 == 240. This is not a standard SOCKS5 command --- the SOCKS5 RFC has this to say about those 4 bytes:
o VER protocol version: X'05' o CMD o CONNECT X'01' o BIND X'02' o UDP ASSOCIATE X'03' o RSV RESERVED o ATYP address type of following address o IP V4 address: X'01' o DOMAINNAME: X'03' o IP V6 address: X'04'
There are only 3 valid commands, CONNECT
0x01, BIND
0x02, and UDP ASSOCIATE
0x03, no 0xF0. So it looks like Tor secretly has a separate SOCKS5 command that resolves hostnames, and it might be more reliable than doing the connect
to the hostname directly.
So I went looking for Tor extensions, and here's a document I found: https://github.com/torproject/torspec/blob/master/socks-extensions.txt
And I quote:
2. Name lookup As an extension to SOCKS4A and SOCKS5, Tor implements a new command value, "RESOLVE" [F0]. When Tor receives a "RESOLVE" SOCKS command, it initiates a remote lookup of the hostname provided as the target address in the SOCKS request. The reply is either an error (if the address couldn't be resolved) or a success response. In the case of success, the address is stored in the portion of the SOCKS response reserved for remote IP address.
So it is an extension that is Tor specific, and thus something I would like to avoid. Yes, 99% of the time the SOCKS5 proxy will be Tor, but someone is going to have a legitimate standards-compliant SOCKS5 proxy that is not Tor at some point, so I would prefer not to use a Tor-only command.
Another thing is that while we write the other parts needed by the SOCKS5 proxy, we write them in separate write
calls, whereas torify wget
writes the 4-byte request followed by the length-preceded hostname in a single write
call. Kind of a long shot but maybe Tor is sensitive to some weird timing thing on the SOCKS5 socket and wants all the bytes in a single write
call? Nah, not buying it.
@hosiawak given that torify wget
uses a Tor-specific extension to the SOCKS5 proxy (that I would like to avoid If I can), could you instead install curl
? curl
can use the standard SOCKS5 protocol without Tor extensions (I think...). Can you please test this command a dozen times on your system?
curl --socks5 127.0.0.1:9050 https://www.google.com/index.html
And see if it succeeds each time? And if strace
or an equivalent is somehow available on FreeBSD could you get me a strace
of a successful run so I can compare what it does to CLBOSS?
I attached truss
(FreeBSD equivalent of strace) to a newly started clboss process and looked for the first occurence of "9050". Here's the relevant bit where it checks duckduckgo.com etc. Is it of any use ?
The output of running the curl command from above in truss in here - it doesn't use the tor proxy for the DNS query but uses the LAN router 192.168.1.1:53
Hmmm. Looking more closely at curl --help
command, it turns out there are two socks5
options:
--socks5 <host[:port]> SOCKS5 proxy on given host + port
...
--socks5-hostname <host[:port]> SOCKS5 proxy, pass host name to proxy
With --socks5
curl
does the resolution separately on the local DNS setup, with --socks5-hostname
curl
does what CLBOSS does, which is to pass the hostname unresolved to the proxy and have the proxy resolve it.
Can you please try with --socks5-hostname
instead?
curl --socks5-hostname 127.0.0.1:9050 https://www.google.com/index.html
In general when C-Lightning sees always-use-proxy
it tries to put even DNS lookups into the proxy (i.e. it uses the CONNECT
command with type 0x03 for hostname), though there may be code paths still where it uses the local DNS setup. And arguably if you want privacy you want to use Tor and always-use-proxy
.
However, the fact that curl
uses the shorter and more natural name socks5
for "resolve locally first, then use SOCKS5 on the IP address" versus the longer socks5-hostname
for "send the hostname to the SOCKS5 proxy" may imply that curl
devs know there are problems with SOCKS5 hostname resolution in the CONNECT
command, and 99% of SOCKS5 use is Tor.
@ZmnSCPxj curl --socks5-hostname ...
succeeds consistently (it takes ~ 10 seconds the first time it's run and then 2-3 seconds for subsequent runs.
Here's the gist of running it on www.duckduckgo.com
(because google.com request got cached I think).
Thank you. Hmmm. I guess I have to check the differences in traces
Here is something that seems different:
sendto(6,"\^E\^B\0\^A",4,MSG_NOSIGNAL,NULL,0) = 4 (0x4)
Basically the "\^E\^B\0\^A" is the bytes 0x05, 0x02, 0x00, 0x01. This is a
BIND` command, which I could not really understand the purpose of in the RFC1928. The truss trace then shows the proxy as responding with 0x05 0x00. Maybe I am missing that? Lemme try to code this up.
No, wait, the previous comment is wrong. That 0x05 0x02 0x00 0x01 is the authentication request. CLBOSS uses 0x05 0x01 0x00, which is basically requesting for authentication type 0 (i.e. unauthenticated). curl
is requesting authentication types 0 or 1 (0x05 0x02 0x00 0x01 --> 0x05 == SOCKS5, 0x02 == two authentication methods following, 0x00 == no authentication, 0x01 == GSSAPI). Then the server responds:
recvfrom(6,"\^E\0",2,0,NULL,0x0) = 2 (0x2)
That is 0x05 0x00: 0x05 == SOCKS5, 0x00 = I selected authentication type "no authentication".
Hmmmm.
Maybe I need to set the socket options and flags specially? Hmmm.
So I decided to strace
the dev-proxy-connect
binary. And the really weird thing is that inside the strace
, the command consistently fails on my system. Outside the strace
sometimes it fails and sometimes it passes.
So another major difference is that in CLBOSS we send the 0x05 0x01 0x03 in one write
call, then we send the length-prefixed hostname, then we send the port number, all in separate write
calls. Both curl
and torify wget
consistently send the entire command-hostname-port sequence in a single write
or sendto
call.
Maybe Tor has a latent bug related to this, where the hostname is not yet properly loaded before Tor goes and pokes its resolver, and that is greatly exacerbated when CLBOSS delays sending the hostname compared to the command. Tor might be looking up a garbage hostname! And nobody can notice this because Tor logs never store the hostname, it just writes [Scrubbed]
, so if the hostname is not yet loaded at the time Tor is looking it up, it could be looking up an incomplete or unloaded garbage hostname, and that is never logged because it is [Scrubbed]
. That would be big WTF if true.
The reason why the command consistently fails when run in an strace
is because the strace
adds enough delay between the command sending and the hostname sending to consistently trigger the bug.
Lemme try if sending the entire sequence in a single large write
call works better....
IT WORKS WHEN I SEND THE ENTIRE SEQUENCE IN A SINGLE LARGE write
CALL WTF. Previously dev-proxy-connect
would inconsistently succeed and sometimes fail. Now it consistently succeeds. WTF. I think I need to go file an issue at Tor.
@hosiawak can you give details on OS kernel version and Tor version of your system? I intend to file a bug report to Tor, and it seems likely they might want to know your details as well.
@ZmnSCPxj FreeBSD 11.2-RELEASE-p15 amd64, Tor version 0.4.4.5.
Thanks! I believe this is resolved so closing issue as well.
My fresh install of FreeBSD clboss-status reports internet connection being constantly offline.
I checked
Boss/Mod/InternetConnectionMonitor.cpp
and it seems to be checking some popular servers on port 443 periodically. I can connect to these servers manually.Here's my
lightningd-bitcoin.conf
:Unfortunately I don't know C++ and
libev
enough to investigate this issue further but if you could maybe provide a small snippet of code I can run on this box that might help. Thank you.