bluesky-social / social-app

The Bluesky Social application for Web, iOS, and Android
https://bsky.app
MIT License
10.34k stars 1.31k forks source link

Custom subdomain handle suddenly stopped validating #4613

Closed GreenFootballs closed 4 months ago

GreenFootballs commented 4 months ago

Describe the bug

Two days ago, I signed in to Bluesky on web and found my profile was showing an "Invalid handle." I use a subdomain at charles.littlegreenfootballs.com - it has been working for several months before this, and I have not changed anything in the DNS setup or the .well-known/atproto-did file.

When I use "Change handle" in the app's settings, it actually verifies the subdomain and handle; but my posts all continue to show "invalid handle."

I went through and double-checked everything about the setup and there's nothing wrong as far as I can tell. Here's that atproto-did file. It returns the file as text/plain with status 200.

Screenshots

bafkreibqoizs62ih5kztl5c6ouy3uu3dd7etsiqaid6myaqt7xtjlo32pe

Details

GreenFootballs commented 4 months ago

Did more testing and the TXT record method works for me, but the subdomain still shows "invalid handle."

GreenFootballs commented 4 months ago

Some people told me they could still see my subdomain handle. Could this be a federation issue?

haileyok commented 4 months ago

@devinivy

GreenFootballs commented 4 months ago

To cross one other possibility off the list, I changed the subdomain record from A to CNAME - still no subdomain handle.

GreenFootballs commented 4 months ago

Is there some other information I can provide to help you folks look into this? I was planning to offer subdomain handles to trusted members of my site, but I can't do that while this bug is still alive.

ericvolp12 commented 4 months ago

I don't see anything on our end that would be preventing the handle from being used/validated.

Any chance you could go through the handle-change process one more time to charles.littlegreenfootballs.com so I can take a look at what state our identity database ends up in? Hopefully that should give me enough info to figure out what the failure mode is here and get us back on track.

GreenFootballs commented 4 months ago

OK, I'll do that now. Do you want me to leave it at "invalid handle" then, or can I change it back to the TLD?

ericvolp12 commented 4 months ago

Leave it at invalid handle for now, I'll be quick.

GreenFootballs commented 4 months ago

OK, it's invalid. This is what I see:

download

GreenFootballs commented 4 months ago

When I get one of my posts with the API, the handle shows as "handle.invalid".

ericvolp12 commented 4 months ago

Yep, I see where we're detecting it as invalid and am trying to figure out how the bit that does that is failing rn.

GreenFootballs commented 4 months ago

Could it be the length?

GreenFootballs commented 4 months ago

May want to check commits for Friday night/Saturday morning because that's when it suddenly began failing.

ericvolp12 commented 4 months ago

It's not length related or anything, I think it's got something to do with the prod network, digging into it. The same code works fine running in other environments and sees your identity as valid etc.

ericvolp12 commented 4 months ago

So for some reason it looks like we can't curl https://charles.littlegreenfootballs.com/.well-known/atproto-did successfully from within PoP2.

I'm wondering what could possibly be going on there. We use a proxy to get out to the internet from that machine but it's not having any problems talking to other hosts on the internet.

Here's a curl:

* Establish HTTP proxy tunnel to charles.littlegreenfootballs.com:443
> CONNECT charles.littlegreenfootballs.com:443 HTTP/1.1
> Host: charles.littlegreenfootballs.com:443
> User-Agent: curl/7.81.0
> Proxy-Connection: Keep-Alive
>
< HTTP/1.1 200 OK
< Date: Tue, 25 Jun 2024 20:15:52 GMT
< Proxy-Connection: keep-alive
< Server: ATS/9.1.1
<
* Proxy replied 200 to CONNECT request
* CONNECT phase completed!
* ALPN, offering h2
* ALPN, offering http/1.1
*  CAfile: /etc/ssl/certs/ca-certificates.crt
*  CApath: /etc/ssl/certs
* TLSv1.0 (OUT), TLS header, Certificate Status (22):
* TLSv1.3 (OUT), TLS handshake, Client hello (1):
* TLSv1.0 (OUT), TLS header, Unknown (21):
* TLSv1.3 (OUT), TLS alert, decode error (562):
* error:0A000126:SSL routines::unexpected eof while reading
* Closing connection 0
curl: (35) error:0A000126:SSL routines::unexpected eof while reading

FYI both machines in both DCs are running the same version of OpenSSL and Curl

ericvolp12 commented 4 months ago

Attempting to curl from our router machines (the ones that act as internet entry/egress-points) shows the same issue in PoP2.

I can't even curl or ping littlegreenfootballs.com from our PoP2 datacenter.

Is there any chance you or your hosting provider is blocking Bluesky's IPs in our east-coast DC?

iirc we're not using our BGP prefix IPs in that datacenter yet.

GreenFootballs commented 4 months ago

That's odd - it was working fine for months, something must have changed. What IP should I check?

ericvolp12 commented 4 months ago

Try 38.120.64.66, that's likely what we're trying to connect from in PoP2.

GreenFootballs commented 4 months ago

Not finding that IP in any of the usual places.

ericvolp12 commented 4 months ago

It's routed via Cogent on the east coast, any chance there's some kind of hosting provider filtering going on? Are you hosting on AWS or Digital-Ocean or somewhere else? I can try to spin up something and see if the provider is at fault.

GreenFootballs commented 4 months ago

The provider is Hosting Matters - OS is Centos 8. Web server is LiteSpeed/6.0.12.

GreenFootballs commented 4 months ago

The TXT record method still works, by the way. Wouldn't that also fail if there was an IP block somewhere?

ericvolp12 commented 4 months ago

Okay, it might be worth submitting a ticket with their helpdesk to see if there's some kind of firewall rule blocking connections.

Re-TXT record, the TXT record stuff is validated via DNS, we use CloudFlare and Google as DNS query resolvers so they'll proxy requests to your DNS server on our behalf.

GreenFootballs commented 4 months ago

OK, I'll put in a ticket asking them to check 38.120.64.66

ericvolp12 commented 4 months ago

Thanks, and sorry this is such a pain. The internet and routing is built on relationships and reputation and we're still pretty new :)

GreenFootballs commented 4 months ago

OK, ticket's filed, I'll update here when they answer. They're usually pretty quick.

GreenFootballs commented 4 months ago

And thank you for your assistance!

GreenFootballs commented 4 months ago

Possibly relevant?

SSL Library Error: error:0A000126:SSL routines::unexpected eof while reading #22690

GreenFootballs commented 4 months ago

Host says the IP is not being blocked, and they flushed all other blocks.

GreenFootballs commented 4 months ago

OpenSSL on my server is v 1.1.1k, are you on the newer version?

ericvolp12 commented 4 months ago

CleanShot 2024-06-25 at 15 27 36 I think this might actually be an issue with cogent routing.

I'm unable to even use Cogent's looking-glass to ping your server from their Washington DC router.

From San Jose tho it works fine: CleanShot 2024-06-25 at 15 28 25

I think I might need to ring up Cogent.

GreenFootballs commented 4 months ago

My subdomain is working again, thanks. I guess Cogent straightened out whatever was clogging the pipes.

GreenFootballs commented 4 months ago

Closing this issue as solved.