bluesky-social / atproto

Social networking technology created by Bluesky
Other
6.17k stars 433 forks source link

Improper mention facet extraction (problem with byte offsets) #2823

Open mfn opened 1 week ago

mfn commented 1 week ago

Describe the bug

As a consumer of the Bluesky API I came across a post with mention facets where my code wasn't highlighting the mention facets correctly.

After some analysis I realized that the information I received via the API must be wrong. Once I located the post itself on Bluesky and realized it's also rendered incorrectly, namely exactly as my code produced it.

To Reproduce

I do not have a reproducer, I did not write the post, I just tripped over it when consuming the api.

Here's the post https://bsky.app/profile/diegodeabreu.bsky.social/post/3l4jzk4mkzd2r

Screenshot: image

As can be seen, the mention highlights are wrong (this is consistent with what the API returns, though).

Expected behavior

Mention byte offsets should be corrct

Details

I don't have any of these details, it's the official server.

Additional context

Here is the payload of the API call I received when I requested that post

API response ``` { "posts": [ { "labels": [], "uri": "at://did:plc:pxbptk7tzl3szbgaxxg36rru/app.bsky.feed.post/3l4jzk4mkzd2r", "quoteCount": 0, "indexedAt": "2024-09-19T21:47:55.296Z", "replyCount": 1, "repostCount": 1, "likeCount": 9, "viewer": { "threadMuted": false, "embeddingDisabled": false }, "cid": "bafyreidhsova7ysf42s3jvr2byz3nzzo57ivh27vpkhzd2qvidpuare27u", "author": { "displayName": "Diego Feijó de Abreu ", "avatar": "https://cdn.bsky.app/img/avatar/plain/did:plc:pxbptk7tzl3szbgaxxg36rru/bafkreicmprenqtyhepzpsuv5uor4bdlol3fliiadhxeej6uy6vojfjb44e@jpeg", "associated": { "chat": { "allowIncoming": "all" } }, "viewer": { "muted": false, "blockedBy": false }, "labels": [], "createdAt": "2023-04-17T19:34:49.429Z", "did": "did:plc:pxbptk7tzl3szbgaxxg36rru", "handle": "diegodeabreu.bsky.social" }, "record": { "$type": "app.bsky.feed.post", "createdAt": "2024-09-19T21:47:55.296624Z", "facets": [ { "features": [ { "$type": "app.bsky.richtext.facet#mention", "did": "did:plc:nvfposmpmhegtyvhbs75s3pw" } ], "index": { "byteStart": 20, "byteEnd": 37 } }, { "features": [ { "did": "did:plc:s6j27rxb3ic2rxw73ixgqv2p", "$type": "app.bsky.richtext.facet#mention" } ], "index": { "byteEnd": 81, "byteStart": 60 } }, { "features": [ { "did": "did:plc:uewxgchsjy4kmtu7dcxa77us", "$type": "app.bsky.richtext.facet#mention" } ], "index": { "byteEnd": 118, "byteStart": 104 } }, { "index": { "byteEnd": 161, "byteStart": 141 }, "features": [ { "$type": "app.bsky.richtext.facet#mention", "did": "did:plc:mf5dzzqkp7fnmby6blfeljwj" } ] }, { "features": [ { "did": "did:plc:e62gb2ushvtvjvqcbrxeaw2n", "$type": "app.bsky.richtext.facet#mention" } ], "index": { "byteStart": 184, "byteEnd": 208 } }, { "features": [ { "$type": "app.bsky.richtext.facet#mention", "did": "did:plc:e72cwu7fen37hzzzhwy6mkxp" } ], "index": { "byteEnd": 257, "byteStart": 231 } } ], "reply": { "parent": { "cid": "bafyreid53re4egytyqqmvvcq4plfm4n5qeeb7th7cunmkfg2v5z5cttmae", "uri": "at://did:plc:pxbptk7tzl3szbgaxxg36rru/app.bsky.feed.post/3l4jzk2qju32g" }, "root": { "cid": "bafyreid53re4egytyqqmvvcq4plfm4n5qeeb7th7cunmkfg2v5z5cttmae", "uri": "at://did:plc:pxbptk7tzl3szbgaxxg36rru/app.bsky.feed.post/3l4jzk2qju32g" } }, "text": "Posição: 1º podcast @jamellebouie.net \nPosição: 2º podcast @kenwhite.bsky.social \nPosição: 3º podcast @bloomberg.com \nPosição: 4º podcast @junlper.bsky.social \nPosição: 5º podcast @chrislhayes.bsky.social \nPosição: 6º podcast @hausofdecline.bsky.social" } } ] } ```
bnewbold commented 3 days ago

It looks to me like the posts you have linked to (by @diegodeabreu.bsky.social) are test posts by a developer who is learning how to implement the facet system. The the bsky posts Lexicons, it is up to the client developer creating the posts to correctly generate facets (or use an SDK/helper which implements this for them).

If that is correct, this issue should be file/reported to that developer themselves, right?

mfn commented 3 days ago

Interesting approach to leave this up to the client and leave the possible interpretation of broken data to everyone else.

I've nothing more to add, I'm not the creator, just a consumer of the api.

Seems to me this is all by design and there's nothing actionable.

If so, I guess this issue can be closed?