LemmyNet / lemmy

🐀 A link aggregator and forum for the fediverse
https://join-lemmy.org
GNU Affero General Public License v3.0
12.95k stars 860 forks source link

[Bug]: couldnt_find_object remote post #4526

Closed aeharding closed 2 months ago

aeharding commented 3 months ago

Requirements

Summary

When trying to call resolve_object from lemmy.world with the following URL: https://lemmy.world/post/12992114

I get {"error":"couldnt_find_object"}

Steps to Reproduce

  1. Click https://lemmy.world/api/v3/resolve_object?q=https%3A%2F%2Flemmy.world%2Fpost%2F12992114
  2. Observe error

I thought this would be resolved by https://github.com/LemmyNet/lemmy/pull/4073, but maybe that addresses something else?

Technical Details

None

Version

0.19.3

Lemmy Instance URL

lemmy.world

ollytom commented 3 months ago

Are you authenticated to lemmy.world when making that request?

aeharding commented 3 months ago

Are you authenticated to lemmy.world when making that request?

Yes

Nutomic commented 3 months ago

You need to fetch using the fedilink which is https://slrpnk.net/post/7531569

aeharding commented 3 months ago

@Nutomic Unfortunately I don't have the fedilink in this scenario. I am adding deep linking to Voyager, so a user taps the share button in a browser for an arbitrary post, and Voyager gets the URL that the browser shares from the browser share sheet - nothing else.

The current workaround I am using for voyager is to parse the URL with regex, extract the post ID, and then make a request for the post on the instance. Then I get the ap_id.

However this is not ideal because:

  1. It relies on known URL format for regex parsing. If this changes, like in #875, it will break
  2. It has to make a request to the instance the URL was provided. This is not ideal because the client is directly connecting (with IP address/user agent) to that arbitrary instance.

Hopefully that helps explain my roadblocks, let me know if I can clarify anything else.

Nutomic commented 3 months ago

Actually the latest Lemmy version includes a feature so that the link generates a redirect to the proper fedilink. For some reason its not working on lemmy.world, but you can try on lemmy.ml:

$ curl -H "Accept: application/activity+json" https://lemmy.ml/post/13061738 -v
HTTP/2 308
location: https://lemmy.zip/post/11490342

I hope this helps you, and lemmy.world admins can probably fix it on their instance somehow. If it doesnt help I need some more details what you want to do exactly, like an issue link in your repo.

aeharding commented 3 months ago

@Nutomic I looked a bit more, and I think this problem only arises if you try to resolve_object for a remote post URL that is on the same instance that you're trying to resolve on. For example:

Given instance a.com, b.com and c.com:

  1. Assume remote POST_URL = https://a.com/post/id (with fedilink = https://c.com/post/id)
  2. try to resolve_object of POST_URL on a.com. For example, https://a.com/api/v3/resolve_object?q=https://a.com/post/id
  3. it will fail
  4. try to resolve_object of POST_URL on b.com. For example, https://b.com/api/v3/resolve_object?q=https://a.com/post/id
  5. it will succeed

Some concrete examples:


For the following URL https://lemmy.ml/post/13045952 (fedilink = https://lemmy.ca/post/17208090)


For the following URL https://lemmy.world/post/12992114 (fedilink = https://slrpnk.net/post/7531569)


Note: It is important that the URL attempted to resolve is remote. Resolving the fedilink on its own instance works. Example:

TLDR Attempting to resolve a remote post on its own instance fails.

Nutomic commented 3 months ago

Right, in case of a local url it will search directly in the database post.ap_id column and not make any network request. I dont see a good way to fix this in Lemmy because its actually handled in the federation library.

Why dont you use my previous suggestion and follow the redirect instead? Ie do curl -H "Accept: application/activity+json" https://ds9.lemmy.ml/post/9635 and read the fedilink from location header.

aeharding commented 3 months ago

Hey, sorry for the delay!

Why dont you use my previous suggestion and follow the redirect instead? Ie do curl -H "Accept: application/activity+json" https://ds9.lemmy.ml/post/9635 and read the fedilink from location header.

The main goal for me is to avoid connecting to arbitrary servers for user privacy reasons.

While your suggestion may work great for servers/native clients, when doing your suggestion on the web, web clients are forced to follow the 308 response code and make another request at the destination (fedilink) server. Unfortunately it is impossible for web clients to abort following redirects and read the location header. (For light reading, see https://github.com/whatwg/fetch/issues/763#issuecomment-466631598 and https://github.com/whatwg/fetch/issues/601)

So your suggestion does technically work for the web (client will follow redirects and then you can get the final response.url), but it doesn't address this privacy concern for web clients since redirect cannot be aborted.

It is also fragile for web clients since they must follow redirect: the fedilink server must be up and running, configured properly (allow CORS).

(I found out that lemmy.world CORS is misconfigured for this request, so I will need to follow up with them. Its possible that other instances are misconfigured as well, so this request can be a bit fragile on web clients.)