Open mlissner opened 1 year ago
Well, OK, there's a workaround if you're the one posting the link, but it's still broken for everybody that's not this clever.
You can substitute %2e
instead of the last period in your link. This works:
https://storage.courtlistener.com/recap/gov.uscourts.flsd.648654/gov.uscourts.flsd.648654.3.0%2epdf
Not, um, exactly, great, but it's something!
One other thought here. It isn't part of open graph, but I've always thought it would be nice to serve open graph data via headers. In fact, I think Facebook must have gotten distracted while building the spec, and just didn't get around to this.
If BlueSky supported this one day, it'd make it possible to return detailed information and thumbnails when serving binary content.
(I've been banging this drum for a decade or so.)
Right now, if you put a link ending in
.html
into the composer (on Web), and ask the website to generate a card, you can watch the network panel make a request to https://cardyb.bsky.app/v1/extract.For example, this URL makes a request in the network panel when you ask to make a card:
https://foo.com/foo.html
But these URLs, ending in
.jpeg
,.png
,.pdf
and.xml
do not:https://foo.com/foo.jpeg
https://foo.com/foo.png
https://foo.com/foo.xml
https://foo.com/foo.pdf
I understand the reasoning: In theory, those file endings are telling Blue Sky that they will not have Social Graph information, since that information only exists in HTML content.
That theory is correct, but at the website I run, we share millions of PDFs, and we have a neat hack in place to help fight misinformation and provide better details to our users. When we detect an open graph crawler, we redirect the crawler to an HTML page with open graph data (if it's not a crawler, we serve the PDF). I know that DocumentCloud also uses this trick.
This works on Twitter, Facebook, Slack, Mastodon and a bunch of other sites. As far as I know, Blue Sky is the only one where it doesn't work.
To Reproduce
Paste this link into the web composer: https://foo.com/f.html
Open the browser's network panel.
Press the button in the composer to get the card.
Note that it returns an error (the link doesn't work), and that you see a request in the network panel:
Change the URL to https://foo.com/f.pdf
Press the button in the composer to get the card.
Note that it made no requests and throws no error.
Expected behavior
Blue Sky should go to the URL, regardless of the file ending, and test if it's actually HTML or a PDF. Heck, some horribly misconfigured website might end links with .pdf even when serving HTML. :)
Details
Additional context
This bug is a bit of a bummer because one of the things that drove me to Blue Sky is that Twitter removed headlines. This bug means that the links from my website don't have twitter cards or headlines either. Darn!
I took a look around the code, but couldn't find where this is done. If somebody sends a pointer, we've got technical folks and volunteers that could help with this.