bluesky-social / atproto

Social networking technology created by Bluesky
Other
6.18k stars 433 forks source link

add embed information to `external` #2009

Open haileyok opened 9 months ago

haileyok commented 9 months ago

Is your feature request related to a problem? Please describe.

Currently, embeds are handled by parsing uri in external by the client. The client has to parse the uri in each external embed and each client has to implement parse logic.

Describe the solution you'd like

When an external embed is added to a post, the site metadata is requested by the client and added to the record. We could add perhaps embedUri or similar to the external record along with other optional data such as height and width.

See Mary's comment below on how that data can be retrieved from og/twitter tags. Much easier to implement than dealing with oembed!

Adding support to cardyb (not sure what repo if any that is in) to include this in the metadata fetch response would allow the client to handle adding it to the record. Each client can choose whether or not to handle specific embeds by type of embeds or not handle them at all.

Additionally, as noted in this issue, adding alt text to external embeds (even if not visible to the client itself) could be useful in this regard. If we take again the example of Giphy, we can add the gif's alt text (which can be retrieved from the api) to that field. The presence of even availability of alt text doesn't need to be present at all for this, but could be added to embedded post's accessibility labels when present.

Describe alternatives you've considered

The current implementation for determining if/when to embed is okay, but results in less of a standard across clients. As for fetching oembed info, this could be done by the client itself without changing cardyb, but since the idea seems to be to proxy that information, it would defeat that purpose (I'm assuming that is the purpose based on the fact that images are proxied instead of being direct links)

Additional context

Not fully up to speed on the spec, so this might not be as simple as I imagine it, so apologies if I am oversimplifying!

If it is as simple as this, I'd be fine with opening the PR myself, but of course don't want to do so 1. if I have no idea what I'm doing or 2. it goes against your plans :)

mary-ext commented 9 months ago

My idea was more about relying on twitter:player tags as a way for apps to know that an external link contains an iframe (AV player) that can be embedded right on the app itself

Here's an example web page from Twitter as for how the twitter:player is supposed to work

https://github.com/twitterdev/cards-player-samples/blob/f556934d382f5fb8945017cfb294b95ebf53ae16/player/page.html#L7-L15

We can extract these meta tags and add a player object on app.bsky.embed.external#external that would contain the iframe URL and a requested width/height of the iframe (which is optional)

By building out a generic iframe solution like this we can also have link scrapers like Bluesky's current cardyb scraper to add on additional players that the site itself might not provide.

There are risks associated with this however, for one, malicious actors could very easily slip in a different player/link URL, but given that apps are complying to EU's requirement of needing an alert before any external players are rendered I feel like this might be less of an issue (so long as domains are shown in the UI)

We might have to check if major sites currently supporting these player tags are directing to a different domain, because if not, then it might be worthwhile to add a warning there too.

haileyok commented 9 months ago

Oh I didn't even realize that was a tag! I thought what you had been linking was just an example of how Twitter did it. That's really useful and probably simplifies the process even further since I'm assuming most if not all of the services that are going about properly implementing oembed are also adding those tags and doesn't rely on the meta tool needing a list of known oembed sources, just clients themselves implementing strict checking.

Ex giphy:

Screenshot 2024-01-01 at 7 14 17 PM

Also see: https://github.com/bluesky-social/social-app/issues/2314#issuecomment-1873585443

pfrazee commented 9 months ago

This does open up trust questions. There's already some question over whether we ought to be trusting the fields in external now or if we ought to be creating the card from the primary source on read (and caching it). Accepting an arbitrary embed-player URI is slightly worse on that front. To wit, at one point I had considered enabling any arbitrary .gif or .mp4 etc get a webview player but felt unsure enough about it that I decided we stick to well-known sources.

I'm unsure if this is being overly careful, but I figure we can discuss a bit more before we get into it.

mary-ext commented 9 months ago

Yeah, I'm a bit wary that we'd be heavily relying on first-run notice being shown to the user to get some sense of control from the arbitrary URL field.

It's unclear how much control we have in reality, given the user tendency to proceed without reading through the notice.

haileyok commented 9 months ago

Definitely agree with arbitrary players. There would need to be a defined list of acceptable ones and verification against that list before embedding.

I think my biggest concern here is trusting other clients to do the right thing. Even if the developers of your client are not intentionally malicious, there's always the possibility of messing up or just not being strict enough with the checks. Of course I also think that making it harder to do embeds raises that risk. It's relatively trivial to check hostnames properly, but when you're doing more parsing there's more room for error.

haileyok commented 9 months ago

Yeah, I'm a bit wary that we'd be heavily relying on first-run notice being shown to the user to get some sense of control from the arbitrary URL field.

It's unclear how much control we have in reality, given the user tendency to proceed without reading through the notice.

Yeah I'm personally against any option that doesn't include an internal list of acceptable providers. Just offering a little heads up isn't nearly enough to prevent issues. I would have to imagine this is the case across other similar platforms too.