diaspora / diaspora

A privacy-aware, distributed, open source social network.
https://diasporafoundation.org/
GNU Affero General Public License v3.0
13.41k stars 2.92k forks source link

Parse post content to make links local #7474

Open Flaburgan opened 7 years ago

Flaburgan commented 7 years ago

As a way to help users deal with federation, I was wondering if it could be a good idea to automatically parse every posts and comments locally to link to the local post, allowing the user to interact easily.

For example, if I post a message containing a link to https://diaspora-fr.org/posts/diasporafr_id, when framasphere.org displays it to its users, it will convert it to https://framasphere.org/posts/guid or even local_framasphere_id.

What do you think?

SuperTux88 commented 7 years ago

That should be done when sending a post and it should be converted to a generic format like diaspora://post/guid (that should be specified in the diaspora federation protocol), because https://diaspora-fr.org/posts/diasporafr_id is a diaspora-specific url, and that wouldn't work with posts from friendica. When receiving a post with a link to a diaspora:// url we can even fetch the post if it is unknown.

Flaburgan commented 7 years ago

I was not too sure about modifying the original content. The idea was exactly to detect diaspora (and why not friendica) specific urls, by checking if the host was a known pod. I guess it would be cleaner done inside the protocol, but way more complex to do instead of a simple regex which works locally.

SuperTux88 commented 7 years ago

You would modify the post before sending, so similar to mentions. With your solution you would need many software specific and version specific regexes and you need to resolv the ID to the GUID every time you display the post (so you need many requests to the remote pod, and it would only work when the pod is up and you rely on inofficial APIs like /posts/:id.json to resolv the ID to a GUID).

So yes, we should fix that, but we shouldn't add a hacky solution for that, it should be a clean specified solution working for the whole protocol. When doing it while writing the post, you need to only match your domain, and paths of your software (so /posts/id and /posts/guid) and you don't need to parse all URLs and check if they are a valid pod and from which inofficial API you may can guess the GUID. With a specified solution you wouldn't need to guess anything and everything you need (the GUID) is already included in the post in the database. And it is really easy to handle custom protocols with markdown-it.

Flaburgan commented 7 years ago

So you mean the GUID of a received post isn't in the local database? It is displayed when adding .json to the URL.

cmrd-senya commented 7 years ago

You normally link other posts which may be unknown to your pod, so you have to fetch them.

23.06.2017 12:18, Fla пишет:

So you mean the GUID of a received post isn't in the local database? It is displayed when adding .json to the URL.

— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub https://github.com/diaspora/diaspora/issues/7474#issuecomment-310616320, or mute the thread https://github.com/notifications/unsubscribe-auth/AJtzQg3myQhm4T45tsOUH2kUxcsZejGKks5sG4LggaJpZM4ODR0p.

SuperTux88 commented 7 years ago

The GUID of a local post (with a local ID) is in the DB. But as said, you would rely on the inofficial .json "API" of the remote server. Because when you want to do it when displaying it, you would need to get the GUID for a remote ID from a remote server. That's why it's better when the sending server already sends you the GUID (and it's also easier for the sending server to get the GUID, because they already know it and don't need to guess any inofficial APIs), when receiving you can then fetch the post when you don't know it (fetching is an official API already in the protocol and needs the GUID).

Flaburgan commented 7 years ago

Hm, so the other way to do it would be for the sender to parse the content before sending it and already replace the local ID by the GUID. No need for the receiver to call the sender then. That's partially what you're suggesting to do by integrating it in the federation protocol.

Which makes me wondering, what if I insert a link to a post on another pod (framasphere) in a post written from diaspora-fr? Then d-fr needs to call framasphere to get the GUID before sending the post.

For example, if I add a link to https://framasphere.org/posts/1234 in a post written on d-fr, then d-fr doesn't know the GUID of this post because it only has the local id of framasphere. So it has to request it.

SuperTux88 commented 7 years ago

You work on your home pod, so only linking to a post on your pod would work that way, everything else would include guessing again. When linking to another pod that would be the same as linking to a blogpost, it's a simple external link. We need this issue to make internal links work via federation.

We would need another functionality for "open this post on my home pod" (so you can get your local link), but that's a completely different topic. For linking to other post that's probably only a small edge case, because most people work work mostly on their home pod. An "open this post on my home pod" would be useful when you find a link to a diaspora post in a blog post for example, but as said, that's out of scope here.

When the sender pod creates a diaspora://post/guid link that would even work for linking private posts. You can't fetch anything for private post (that's why manually call .json to get the guid wouldn't work, and that's another big reason I want to automate that and include it in the protocol-spec). So when the receiver of the post with the link can see the private post, the link would work again.

DeusFigendi commented 7 years ago

I like Benjamins approach. I just thought…

I was not too sure about modifying the original content.

So I thought it would be a good idea also to include the original-link. But Benjamin said: Diaspora is already modifying the content when mentioning. And you can easy reconstruct the original link, because the origin-pod is in author.diaspora_id and only those would be converted.

Just this little bit: If a pod receives a diaspora://post/guid link it should check if that guid is in its database and if it isn't the pod should pull that post from the origin-pod. The origin (sending) pod should only re-federate public posts (but the receiving pod cannot know if the post is limited or public). This would also enable users to force re-federation of public posts to pods that didn't get some important content.

I am not sure if it should do so for diaspora://people/guid but having the users display-name and tags would make the search better.

Oh and btw. talking about markdown… a plain link (without markdown) could be moved into the text. Like…

https://pod.geraspora.de/posts/571518

[https://pod.geraspora.de/posts/571518](diaspora://post/fc2c37767f9d0799)

and a people-link could contain both, the handle and the guid.

https://pod.geraspora.de/people/ea0d1350dee29b59

[supertux@nerdpol.ch](diaspora://people/ea0d1350dee29b59)

you could also use mentions here but I think it's a bad idea to set a mention the writing user hasn't intended. On the other hand: it would become intentional if this is the way diaspora works so... the users could use markdown to avoid this transformation:

https://pod.geraspora.de/people/ea0d1350dee29b59

@{supertux@nerdpol.ch}

[SuperTux88](https://pod.geraspora.de/people/ea0d1350dee29b59)

[SuperTux88](diaspora://people/ea0d1350dee29b59)

hmm but it could also become @{SuperTux88 ; supertux@nerdpol.ch} damnit XD

but you get the point eh?

SuperTux88 commented 7 years ago

you can easy reconstruct the original link

Why do you want to reconstruct the original link?

If a pod receives a diaspora://post/guid link it should check if that guid is in its database and if it isn't the pod should pull that post from the origin-pod.

Yes, that's what I already wrote, and with a protocol-specific format we could also fetch posts from friendica, because fetching is already in the protocol specs. (only works for public posts, but if you link a private post the receiver either already received it or couldn't see it anyway)

I am not sure if it should do so for diaspora://people/guid but having the users display-name and tags

For linking people we have already mentions and it's already working (and it's even supported in comments now, and it fetches mentioned people if not already known). So I don't think we need a second way to link people.

And tags are already linked locally.

Oh and btw. talking about markdown… a plain link (without markdown) could be moved into the text.

Yes, that makes sense.

And by the way, linking to profiles with an external link doesn't work, this only shows the login page, so just use mentions for that (again, it's already working and also shows a hovercard, so people can add mentioned people directly to their aspects)

DeusFigendi commented 7 years ago

Why do you want to reconstruct the original link?

Just for politics/social reasons. It always feels a little... naughty to finger in other peoples conten, changing things they maybe didn't want to. But if you can say: "yes, I changed your content, but see: One can reconstruct your (original) content, no information got lost." than... well than it's kinda okay to do so :D And that's also why I wasn't sure about linking profiles and converting this to mentioning. I think just not to touch this is the better way to go, so users decide if they wanna mention or not.

Who ever gets her hands on this: please don't touch stuff inside code-blocks.

denschub commented 7 years ago

In my opinion, we don't change the content at all, we just change internal details.

I don't see how I could consider that as changing contents, it actually makes it even more accessible.

noplanman commented 7 years ago

@SuperTux88:

We would need another functionality for "open this post on my home pod" (so you can get your local link), but that's a completely different topic.

Is it even possible to implement a solution into the core code of diaspora* that works? As far as I'm aware, the only way to do this would be through cookies or local storage, in which the home pod domain name is saved. This would have to be done for every pod out there, because that data can't be accessed from other domains.

This is the exact reason why I opted for a userscript to do this, with a feature called OOMP (open on my pod).

As for the whole local ID vs. GUID discussion, is there a reason why local IDs are used at all? Wouldn't it make sense to only work with the GUIDs?

(or have I missed something in the discussion thus far?)

SuperTux88 commented 7 years ago

Is it even possible to implement a solution into the core code of diaspora* that works?

Sure, but you need to enter your home pod for every other pod where you need this function. But since probably 90% of the links are from the same few pods (friends sending you links via other ways, chat for example), you need to do that for a few pods and can then save that in local storage for that pod. So I don't see why this should be a problem. We can even autocomplete this input, because the other pod has a list of pod he knows. And we can trigger to fetch the post on your home pod, when your home pod doesn't know the post already.

is there a reason why local IDs are used at all

ID is faster than GUID, that's why we use ID for the link at the timestamp (which is probably clicked most of the times) and the perma-link icon links to the GUID now.

noplanman commented 7 years ago

ID is faster than GUID

Faster when opening the post you mean? Or where exactly is the speed difference? I'm busy implementing cross-pod post linking with a userscript, which basically uses the GUIDs for everything. Just curious to see if there's a better way to do this.

And we can trigger to fetch the post on your home pod, when your home pod doesn't know the post already.

Ok, this is a cool thing, to get gaps in the federation closed 👍 Also, I agree that this is a viable solution, especially if it's as easy as "Choose your home pod from this list:", which would be a 1-time deal per pod (for each home pod change, which should be rare).

Flaburgan commented 7 years ago

In my opinion, we don't change the content at all, we just change internal details. Before: Link to a specific post. After: Link to a specific post that works better. I don't see how I could consider that as changing contents, it actually makes it even more accessible.

Well, there is still the (unfortunately famous) use case "Oh, but you don't have all the comments on the post on your pod? Checkout from mine, it has everything! https://...".

And here this solution is doing the exact opposite of what the user is trying to do: send someone to another pod because the current one doesn't have what he is looking for.

But IMO this problem is not a blocker to this solution, because in this use case, what we should do is improve the federation, not limit what we're doing here.

SuperTux88 commented 7 years ago

Linking this with diaspora/diaspora_federation#75. diaspora/diaspora_federation#78 is the first step to add forward compatibility.

spixi commented 7 years ago

This idea sounds nice, but I don't like the URI scheme diaspora:

A more generic scheme like social: would be better imo.

So, social:photo/{{UID}}?height=320px, social:post/{{UID}}?comments=0 and social:profile/{{UID}}#hobbies are ideas for possible URIs.

The URI diaspora://people/guid is malformed, because it indicates that people is the authority responsible for the resource guid. Correctly formed URI as of RFC 3986 include diaspora:people/guid, social://diaspora/people/guid or social-people://diaspora/guid. But I also prefer to rename people to profile, because a person may have multiple profiles and there may also be profiles of bots, companies and organizations. Getting rid of diaspora also enables to find a profile, which moved to another service like Mastodon, friendica or Hubzilla.

SuperTux88 commented 7 years ago

I have chosen diaspora:// because it links to another entity available within the diaspora-protocol and fetchable with the diaspora-protocol. While it would be nice to have a "social"-protocol, compatible with everybody, there is nothing compatible with all social networks yet, so we can't use social:// for us.

Also, this is primarily for linking to posts (or maybe other entities in the future), linking to profiles is already done with mentions @{user@example.org}, because to fetch a profile you need to know where to webfinger it. And the diaspora:// scheme doesn't replace mentions.

And friendica already implemented the diaspora:// scheme: friendica/friendica#3682

spixi commented 7 years ago

@SuperTux88: I understand your thoughts and if there is already an existing interface, this should be kept for compatibility reasons.

However, I think, you are confusing network protocols and URI schemes. SMTP and IPv6 are network protocol, but mailto: and urn: are URI schemes. Although some URI schemes are associated with network protocols, this is not a requirement. Consider a protocol as the way to deliver post and a URI scheme a way to describe an address.

It is important to understand, how URI schemes work. The part before the first colon is the scheme identifier, which tells you, which kind of resource is referred to. Then there is the possibility to tell which authority is responsible for the resource. This optional part begins with a double slash and ends with a single slash. The third part is the path of the resource which is often delimited by slashes or colons. Each URI may also be followed by exactly one query, which starts with a question mark and is usually delimited by ampersands or semicolons and exactly one fragment part, which starts with a hash. The query part may contain information about how the resource shall be retrieved and the fragment part can be used to access parts of a ressource, like an anchor in a hypertext page.

So, scheme:path?query#fragment (e.g. mailto:alice@example.com?subject=Hallo&body=Welt or urn:isbn:9783141007008) and scheme://authority/path?query#fragment (e.g. http://www.example.com:80/index.html#anchor or ftp://anonymous@[::1]/public/example.txt) are valid URIs, but authority://path or scheme://path are not.

However, the authority may be the empty string, that explains while file:///c:/users/alice/example.txt requires three slashes.

cmrd-senya commented 7 years ago

I'm not sure that we have resources for officially supporting other implementations. For that we need to create a well specified standard (like ActivityPub) basing on our protocol. For now our protocol is supported by others, but it's their initiative. If we ever want to create some official cross-network protocol the first thing to do will be changing protocol name to something project-neutral. The URI schemes and other things must be renamed also. But at least for now there is nobody who is willing to do that job of creating a multi-network project based on diaspora protocol. Look at the ActivityPub: it takes lots of work. So the best we can do with the current resources is to develop diaspora protocol for diaspora and help other projects to support it if they want to. But it's not the time to rename diaspora protocol or URI schemas to something different. At least there must be a person who is willing to work on protocol standartization in order to do that.

spixi commented 7 years ago

Issue diaspora_federation#75 has been posted 14 days before. Although there is already some work done, is not yet to late for a RFC 3986 compatible and platform-independent solution. I suggest, at least dropping the //, since diaspora:post/17faf230675101350d995254001bd39e is an absolutely well-formed URI, where diaspora://post/17faf230675101350d995254001bd39e is not. (However, for example, diaspora://nerdpol.ch/post/17faf230675101350d995254001bd39e would be fine again.)

spixi commented 7 years ago

Just note the different behaviour in relative references:

If a resource with the URI diaspora://photo/1 refers to /post/2 or ../../post/2, this is expanded to diaspora://photo/post/2, because relative references cannot escape the authority.

However, a reference from diaspora:photo/1 to /post/2 or ../../post/2 correctly points to diaspora:post/2.

SuperTux88 commented 7 years ago

I'm not sure that we have resources for officially supporting other implementations.

That's actually what I already do a lot: I help everybody who implements the protocol whenever I can to answer questions and help with changes. And it's also the reason why I created the documentation and also one of the reasons we did this blog post.

If we ever want to create some official cross-network protocol the first thing to do will be changing protocol name to something project-neutral.

I don't think that's "the first thing to do", I don't even think it's needed. The name of the protocol doesn't matter and it already works for many different software even when it's named "diaspora protocol". And I think renaming it would only create more confusion.

The URI schemes and other things must be renamed also.

No, because it isn't limited to diaspora. It is used as part of the protocol, and links to another resource within this protocol (that's why it is named the same as the protocol). When used inside the protocol it will never be shown to other users, for example friendica users will only see a local friendica URL to the linked post.

(edit: just to make that clear, I neither have the time nor do I plan to make the diaspora protocol an official spec or something like that. But the protocol already works well for us and works well for others, and I try to help there where I can. And the name of the protocol just isn't a problem, it works independent of the name, it's only a name.)

Just note the different behaviour in relative references

We neither need nor support relative links with the diaspora:// scheme anyway.

jonpatterns commented 6 years ago

1) Convert only post urls from same pod

I don't know if this is done already. If you post a direct url to a post that is the same pod (as you are posting from) then it could be converted to pod independent code. Their pod already has the details - poster and post information.

(When posting it could ask if they want it converted)

That is the main problem away - people go to their post, copy the url and paste it into their comment.

2) pod independent link inside markup

If the pod independent link is posted inside the markup code for code it is still converted to a url (though non-linking).

Examples at-

diaspora://jonpatterns@joindiaspora.com/post/dae9f3c047f90136b3b64061862b8e7b

DeusFigendi commented 6 years ago

If the pod independent link is posted inside the markup code for code it is still converted to a url (though non-linking).

There's an issue for this: #7703