Open dan-lind opened 2 months ago
Hi @dan-lind thanks for raising an issue on the Cloud SQL Proxy!
The Cloud SQL Proxy forwards local traffic to port 3307
of your Cloud SQL instance. Because of this you will need to update your VPC network firewall rules potentially, they should allow TCP egress connection on port 3307
of your Cloud SQL instance private IP address. Can you see if this is maybe the cause of your error?
From our docs:
While the Cloud SQL Auth Proxy can listen on any port, it creates outgoing or egress connections to your Cloud SQL instance IP address on port
3307
. If your client machine has an outbound firewall policy, make sure it allows outgoing connections to port3307
on your Cloud SQL instance's IP.
Let me know as this is a common cause for the issue you are seeing 😄
Am I reading your config correctly that you're connecting to 34 instances?
Hi @dan-lind thanks for raising an issue on the Cloud SQL Proxy!
The Cloud SQL Proxy forwards local traffic to port
3307
of your Cloud SQL instance. Because of this you will need to update your VPC network firewall rules potentially, they should allow TCP egress connection on port3307
of your Cloud SQL instance private IP address. Can you see if this is maybe the cause of your error?From our docs:
While the Cloud SQL Auth Proxy can listen on any port, it creates outgoing or egress connections to your Cloud SQL instance IP address on port
3307
. If your client machine has an outbound firewall policy, make sure it allows outgoing connections to port3307
on your Cloud SQL instance's IP.Let me know as this is a common cause for the issue you are seeing 😄
Hi! Like I wrote in "Additional details", connecting using the MySQL CloudSQL proxy in Datagrip works without any issues. Another tcpdump on that traffic shows that this connection also happens on 3307.
tcpdump: data link type PKTAP
tcpdump: verbose output suppressed, use -v[v]... for full protocol decode
listening on any, link-type PKTAP (Apple DLT_PKTAP), snapshot length 524288 bytes
18:44:01.730197 IP 172.22.164.170.52213 > redacted.3307: Flags [S], seq 308421121, win 65535, options [mss 1360,nop,wscale 6,nop,nop,TS val 1740779703 ecr 0,sackOK,eol], length 0
18:44:01.776055 IP redacted.3307 > 172.22.164.170.52213: Flags [S.], seq 2473669167, ack 308421122, win 32384, options [mss 1332,sackOK,TS val 3117211483 ecr 1740779703,nop,wscale 7], length 0
18:44:01.776115 IP 172.22.164.170.52213 > redacted.3307: Flags [.], ack 1, win 2062, options [nop,nop,TS val 1740779749 ecr 3117211483], length 0
18:44:01.777291 IP 172.22.164.170.52213 > redacted.3307: Flags [P.], seq 1:441, ack 1, win 2062, options [nop,nop,TS val 1740779750 ecr 3117211483], length 440
18:44:01.834037 IP redacted.3307 > 172.22.164.170.52213: Flags [.], ack 441, win 250, options [nop,nop,TS val 3117211536 ecr 1740779750], length 0
18:44:01.834064 IP redacted.3307 > 172.22.164.170.52213: Flags [.], seq 1:1321, ack 441, win 250, options [nop,nop,TS val 3117211538 ecr 1740779750], length 1320
18:44:01.834065 IP redacted.3307 > 172.22.164.170.52213: Flags [P.], seq 1321:1763, ack 441, win 250, options [nop,nop,TS val 3117211538 ecr 1740779750], length 442
18:44:01.834087 IP 172.22.164.170.52213 > redacted.3307: Flags [.], ack 1321, win 2041, options [nop,nop,TS val 1740779807 ecr 3117211538], length 0
18:44:01.834101 IP 172.22.164.170.52213 > redacted.3307: Flags [.], ack 1763, win 2034, options [nop,nop,TS val 1740779807 ecr 3117211538], length 0
18:44:01.835368 IP 172.22.164.170.52213 > redacted.3307: Flags [P.], seq 441:447, ack 1763, win 2034, options [nop,nop,TS val 1740779808 ecr 3117211538], length 6
18:44:01.838264 IP 172.22.164.170.52213 > redacted.3307: Flags [P.], seq 447:1422, ack 1763, win 2048, options [nop,nop,TS val 1740779811 ecr 3117211538], length 975
18:44:01.840919 IP 172.22.164.170.52213 > redacted.3307: Flags [P.], seq 1422:1724, ack 1763, win 2048, options [nop,nop,TS val 1740779814 ecr 3117211538], length 302
18:44:01.841028 IP 172.22.164.170.52213 > redacted.3307: Flags [P.], seq 1724:1798, ack 1763, win 2048, options [nop,nop,TS val 1740779814 ecr 3117211538], length 74
18:44:01.894001 IP redacted.3307 > 172.22.164.170.52213: Flags [.], ack 1724, win 249, options [nop,nop,TS val 3117211600 ecr 1740779808], length 0
18:44:01.894003 IP redacted.3307 > 172.22.164.170.52213: Flags [P.], seq 1763:2836, ack 1724, win 249, options [nop,nop,TS val 3117211601 ecr 1740779808], length 1073
Am I reading your config correctly that you're connecting to 34 instances?
This config has 34 entries, yes. However I've tried reducing it down to a single entry as well, it doesn't change anything.
This happens only when I setup the proxy using the terminal. If I use Intellij/Datagrip, and use the built in MySQL CloudSql proxy driver, I can connect without issues.
That's very interesting, because the built-in driver (which uses the Java Connector) does effectively exactly the same thing.
Have you tried connecting through the Proxy using a CLI tool, or something other than Datagrip to determine if Datagrip is involved?
Yes I have. I've tried "dumbing things down" as much as possible, for example to rule out any issues with the unix sockets.
$ cloud-sql-proxy --private-ip --port 3306 redacted-prod:europe-west1:redacted-live-156c5af9
2024/09/12 19:53:52 Authorizing with Application Default Credentials
2024/09/12 19:53:52 [redacted-prod:europe-west1:redacted-156c5af9] Listening on 127.0.0.1:3306
2024/09/12 19:53:52 The proxy has started successfully and is ready for new connections!
then using the mysql client (v8.4)
$ mysql -u redacted -p --host 127.0.0.1 --port 3306
ERROR 2013 (HY000): Lost connection to MySQL server at 'reading initial communication packet', system error: 2
yields the same result
2024/09/12 19:54:13 [redacted-prod:europe-west1:redacted-156c5af9] Accepted connection from 127.0.0.1:53534
2024/09/12 19:54:13 [redacted-prod:europe-west1:redacted-156c5af9] failed to connect to instance: Dial error: handshake failed (connection name = "redacted-prod:europe-west1:redacted-156c5af9"): read tcp 172.19.50.151:53535->redacted:3307: read: connection reset by peer
tcpdump looks similar
$ sudo tcpdump --interface any -nn port 3307
tcpdump: data link type PKTAP
tcpdump: verbose output suppressed, use -v[v]... for full protocol decode
listening on any, link-type PKTAP (Apple DLT_PKTAP), snapshot length 524288 bytes
19:54:13.730494 IP 172.19.50.151.53535 > redacted.3307: Flags [S], seq 1648980533, win 65535, options [mss 1360,nop,wscale 6,nop,nop,TS val 2312443875 ecr 0,sackOK,eol], length 0
19:54:13.796034 IP redacted.3307 > 172.19.50.151.53535: Flags [S.], seq 1731921490, ack 1648980534, win 32384, options [mss 1332,sackOK,TS val 1671508660 ecr 2312443875,nop,wscale 7], length 0
19:54:13.796091 IP 172.19.50.151.53535 > redacted.3307: Flags [.], ack 1, win 2062, options [nop,nop,TS val 2312443941 ecr 1671508660], length 0
19:54:13.796688 IP 172.19.50.151.53535 > redacted.3307: Flags [.], seq 1:1321, ack 1, win 2062, options [nop,nop,TS val 2312443941 ecr 1671508660], length 1320
19:54:13.796695 IP 172.19.50.151.53535 > redacted.3307: Flags [P.], seq 1321:1486, ack 1, win 2062, options [nop,nop,TS val 2312443941 ecr 1671508660], length 165
19:54:13.843445 IP redacted.3307 > 172.19.50.151.53535: Flags [R.], seq 1, ack 1321, win 2062, length 0
Thanks for the report. In that case I think we need to look into this -- that's wholly unexpected and something we definitely don't see in our automated testing.
Out of curiosity do you see the same behavior across instances?
Yes, same across instances
tors 12 sep. 2024 kl. 21:49 skrev Eno Compton @.***>:
Thanks for the report. In that case I think we need to look into this -- that's wholly unexpected and something we definitely don't see in our automated testing.
Out of curiosity do you see the same behavior across instances?
— Reply to this email directly, view it on GitHub https://github.com/GoogleCloudPlatform/cloud-sql-proxy/issues/2296#issuecomment-2347113732, or unsubscribe https://github.com/notifications/unsubscribe-auth/AADJ2OD2XA5EVYLJ5AGVT6DZWHV5NAVCNFSM6AAAAABODAKRY6VHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDGNBXGEYTGNZTGI . You are receiving this because you were mentioned.Message ID: @.***>
Hi @dan-lind this looks like an issue on the server side.
Do you happen to have a support contract with Google? If so you could open a case and they will have access to check your server logs.
Let me know, otherwise I can try and figure out a way to take a look deeper for you.
We do yes, I've reached out to them referring to this issue.
We do yes, I've reached out to them referring to this issue.
Perfect, that is great to hear.
I'm experiencing the same issue when trying to connect through the proxy. I receive a "connection reset by peer" error during the handshake process. Here’s a snippet from my logs:
failed to connect to instance: Dial error: handshake failed (connection name = "[redacted]-prod:europe-west1:[redacted]-live-6aab2946"): read tcp [ip]:[port]->[redacted]:3307: read: connection reset by peer
This issue occurs consistently (direct access & via python's connectors), I have tried reinitializing from my main device & my alt but nothing seems to work. Other users (on same project, with the same permissions) are not facing this issue even though they are following the same connection procedure & running the same OS with the latest updates.
UPDATE: Downgrading to v2.11.3 seems to have solved the issue for now as a work around, will try to identify the version where the issue started
UPDATE 2: After some git bisecting the commit that started the issue was fd6bd49242c884508f641c754eb5cec5d28ac665
with the bump to go 1.23, reverting the version solves the issue
UPDATE: Downgrading to v2.11.3 seems to have solved the issue for now as a work around, will try to identify the version where the issue started
UPDATE 2: After some git bisecting the commit that started the issue was
fd6bd49242c884508f641c754eb5cec5d28ac665
with the bump to go 1.23, reverting the version solves the issue
@TokieSan Thanks for the detailed analysis and extra data points! I would recommend pinning to an older version for now, in the meantime I will investigate this further to see if there is maybe something on the server side that is causing issues with the latest Go 1.23 TLS changes.
Confirming that downgrading to v2.11.3 solve the issue for me as well!
Confirming that downgrading to v2.11.3 solve the issue for me as well!
Thanks for confirming @dan-lind , appreciate it.
I am just curious if you were able to successfully open up a support case for this issue? If so I can escalate it to get folks to look at the server side and see if we can pinpoint the issue.
I am just curious if you were able to successfully open up a support case for this issue? If so I can escalate it to get folks to look at the server side and see if we can pinpoint the issue.
Yes, I have a support case open!
EDIT: @jackwotherspoon I don't know how this usually works, but I've asked in my support case for them to reach out to you.
EDIT: @jackwotherspoon I don't know how this usually works, but I've asked in my support case for them to reach out to you.
That is perfect, thanks
@dan-lind @TokieSan I am taking a look at the server side to see if there are any issues with the use of Go 1.23
In the meantime, I wanted to point out https://github.com/GoogleCloudPlatform/cloud-sql-proxy/issues/2294 which was also an issue for 2.13.0 when connecting to a Private IP Cloud SQL instance via a VPN...
looking at the changelog of go version 1.22..1.23, I got this change: experimental post-quantum key exchange mechanism X25519Kyber768Draft00 is now enabled by default when Config.CurvePreferences is nil. The default can be reverted by adding tlskyber=0 to the GODEBUG environment variable.
disabling it, using GODEBUG=tlskyber=0 cloud-sql-proxy-2.13.0 ... make cloud-sql-proxy works again inside the vpn.
I wonder if one of you could try starting the v2.13.0 Proxy with GODEBUG=tlskyber=0 cloud-sql-proxy ...
and see if maybe this error is also a symptom of the same issue?
I wonder if one of you could try starting the v2.13.0 Proxy with
GODEBUG=tlskyber=0 cloud-sql-proxy ...
and see if maybe this error is also a symptom of the same issue?
Yes! GODEBUG=tlskyber=0
indeed works with 2.13, i.e. the error goes away and I can connect just as when downgrading to v2.11.3.
Yes!
GODEBUG=tlskyber=0
indeed works with 2.13, i.e. the error goes away and I can connect just as when downgrading to v2.11.3.
@dan-lind Great to hear it!
I am going to look at open Go issue's and see if I need to create one as it seems like the new default preferences implemented in https://github.com/golang/go/issues/67061 are not playing nicely with VPNs
I think we have a similar problem with SOCKS proxies too. https://github.com/GoogleCloudPlatform/alloydb-auth-proxy/issues/714
FYI https://github.com/GoogleCloudPlatform/alloydb-auth-proxy/issues/714#issuecomment-2460486141
We should probably make this a feature request, since the root cause of these TLS handshake failures are buggy TLS servers.
Bug Description
When attempting to connect through the proxy, I'm getting connection reset by peer
Starting the proxy is fine
but when I attempt to make the actual connection, I get
In the client (Datagrip 2024.2.1) I see the error
Socket is not connected.
On the CloudSQL side, I can also see a warning log in Logs explorer like1893634 [Warning] [MY-000000] [Server] Failed to connect to mysql: 1893634. Server-side error number: 1159
Stacktrace
Steps to reproduce?
This happens every time I try connecting though the terminal
Environment
cloud-sql-proxy --config-file ~/cloudsql/cloudsql-config.toml
Content of
cloudsql-config.toml
Additional Details
This happens only when I setup the proxy using the terminal. If I use Intellij/Datagrip, and use the built in MySQL CloudSql proxy driver, I can connect without issues.