go-gitea / gitea

Git with a cup of tea! Painless self-hosted all-in-one software development service, including Git hosting, code review, team collaboration, package registry and CI/CD
https://gitea.com
MIT License
44.42k stars 5.43k forks source link

Gitea2Gitea Federation State #18240

Open 6543 opened 2 years ago

6543 commented 2 years ago

Designs:

Functions:

Auth:


ForgeFed federation task list by @Ta180m

ghost commented 2 years ago

If you're interested in working on federation, a lot of discussion happens in the forge federation Matrix room and Gitea federation Matrix room. Here is a list of useful links and a detailed federation task list with lots of tasks that you can help with.

Forgefriends (a forge federation project based on Gitea) has made some progress on Gitea federation, so you can also help with their Community Action tasks.

KN4CK3R commented 2 years ago

Added Webfinger support in #19462

leecalvink commented 2 years ago

This sounds like an amazing idea!

melvincarvalho commented 2 years ago

Great work on federation:

I'm trying to figure out how to use webfinger

Is there an example of this live?

I've installed the latest gitea from main but I cant seem to work out what string to put in /.well-known/webfinger?resource= or maybe my email is private in the system, I dont know.

I get Not Found whatever I type. It would be really nice to see a working example.

KN4CK3R commented 2 years ago

Did you enable federation in your ini?

melvincarvalho commented 2 years ago

That worked, thanks!

For the record, the steps are:

Add to app.ini

[federation]
ENABLED = TRUE

webfinger URIs

/.well-known/webfinger?resource=acct:<user>@<host>
/.well-known/webfinger?resource=mailto:<email>

activity pub URI

/api/v1/activitypub/user/<user>
6543 commented 2 years ago

well I dont thing gitea's mission is discover federated stuff ... - I created https://codeberg.org/thefederationinfo/the-federation.info/issues/290 to track this idea for the-federation.info ...

but everybody can create there own index as it's an open standart

lunny commented 2 years ago

well I dont thing gitea's mission is discover federated stuff ... - I created https://codeberg.org/thefederationinfo/the-federation.info/issues/290 to track this idea for the-federation.info ...

but everybody can create there own index as it's an open standart

Maybe we can have a decentralized index page so that you can get other instances public repositories information in the current instance.

raucao commented 2 years ago

Compare to Mastodon's "Federated timeline" and "Local timeline" for example. You could easily have an index page (I guess just extending the "explore" page) that shows remote repos that someone from the local instance is following, since that information is stored in the local database anyway.

mscherer commented 2 years ago

But Gitea is a forge where people can federate, not a social media client to explore federation.

There is maybe some value to provide a UX similar to Mastodon and the like (where one follow projects and users rather that collaborate on source code), but maybe this job should be done by a regular activitypub client and not by the web interface of Gitea, as it may make it too complicated and unclear (as you can't be good at 2 different tasks).

3nprob commented 2 years ago

I see it this way: What @tepozoa is talking about is reflecting how the technical implementation should look (assuming instance-discovery/gossip are desired features) , and a UI/UX most of which would primarily be of interest of niche users (as @mscherer points out) but also very much for instance admins/"moderators". For that reason I think it can make sense to consider that as the first web-UI view, despite not being particularly well-suited for the more common user stories. It will ensure federation can happen in a self-controlled manner from the start without confused admins filing issues and throwing around gists of shell concoctions in the Matrix room ;)

I think it'd be putting cart before the horse to start automagically cross-federate or gossip without exposing the plumbing, especially considering many gitea administrators likely primarily rely on the web interface and don't have an established flow for managing their instance via CLI or directly via API calls. With this federating Gitea can be more of a friendly and fun experience than a leap into the dark.

I guess all the actions and CRUD don't need to be triggerable from these views though. Some things might even be configured only in the static configuration. But they should at least be viewable and browsable, with the option to hide and restrict it for unauthenticated/unprivileged users.

@mscherer has a good point on user expectations that could inform the UX and default configuration. I see that as complementary, not a replacement.


As a user, I think it was a big mistake that most of the ActivityPub implementations and servers have adopted the personal-followed/server-local/global-federated 3-split timelines as the end-all-be-all. It solidifies a mental model of server==social community, which can be the case sometimes, especially at the very early stages of a protocol, but makes no sense as a general default and less so the more adoption you have. It significantly raises the barriers for smaller instances to be discoverable to the point where I think that Mastodon specifically being designed this way is an effect of misaligned incentives on the part of the maintainer+host. That's what it is, my point is it should not be used a role-model if what you're building towards is cross-instance collaboration.

What if your e-mail client had separate inboxes depending on which domain the email is coming from, with a separate one for everyone on the same server? Even those of you who'd prefer that would hopefully agree that it'd be terrible as the only option. How many would still be on a minority e-mail server if this was the case in Gmail and Outlook?

There are good reasons why people leaning towards either side as a priority should want to keep the backend technical architecture and data structures as decoupled as possible from how that data is exposed to users.

poVoq commented 2 years ago

I think discoverability is one of the big reasons for projects to stay on Github. So extending Gitea's "explore" view to also include federated instances would go a long way in improving discoverability on distributed git forges.

bqv commented 2 years ago

At the very least, i think trying hard not to differentiate at all between "server-local" and "federated" is the right move

kjhoerr commented 2 years ago

If we're continuing to use Mastodon as an example (of better or worse), they at one point had a separate web application that listed all public instances that were known. This served as both a way for users to find instances that they wanted to make accounts on and for users/admins to find instances to federate with. Being on a particular Mastodon instance has a draw for being in a particular community, as previously noted they have a whole column dedicated to server-local posts. There are a lot of themed instances as a result.

For Mastodon then, users typically are trying to discover both instances and other users to follow, since they (and their posts) are the primary content for the platform. On Gitea, this equivalent would be organizations and repositories. For Gitea there is no real drive to have an account on a particular instance, since there isn't any inherit feature that is exclusive to instances, other than how they are maintained.

In that sense it does affect the potency of a middleware app for discovering new content, since the only driving force behind discoverability are the repositories. I think it would be more effective if it were embedded in Gitea itself. If that's the case maybe it's more worth to have a "relay"-type server working on behalf of Gitea fed servers e.g., Mastodon Relay.

6543 commented 2 years ago

since there isn't any inherit feature that is exclusive to instances

well you can extend & config gitea extensifly .... (CI, custom renderer, ....)

kjhoerr commented 2 years ago

well you can extend & config gitea extensifly .... (CI, custom renderer, ....)

Yeah, I tried to shoehorn that with "how they are maintained", there's a lot of key features that can be enabled in some that aren't in others. I was presuming if Gitea does keep from using features that are specific to instances, because that's more of a social feature.

People can and do choose to join a specific instance of based on the community, not the technology running it - and sometimes they choose not to federate with a remote instance because of the community or content. Have a read of the list of "we will not federate with these people" as a live example: https://mastodon.social/about/more

I 100% agree - I think I was assuming more within a system of good actors that as long as organizations can spread across instances that there isn't a social aspect to being on a specific instance, specifically for Gitea. You are right though, in practice the moderation and maintenance is very important.

3nprob commented 2 years ago

Again I draw the parallel to e-mail - and Matrix - instance discoverability and differentiation can be super useful when choosing where to register. Once you are in and actively using an application, you most likely want that to become as transparent as possible and it becomes more natural to think of organisations and repositories (which apart from implementation technicalities shouldn't inherently need to be tied to specific servers and more long-term I hope to see that be decoupled).

Consider Matrix, which in a few years has grown into a relatively healthy network of cross-federated servers with rooms (channels) and spaces (grouping of rooms) spanning mostly seamlessly across them. Once you have an account you roam freely apart from explicitly private rooms and unfederated servers. ACLs (permissions) and federation are orthogonal. Whatever you're looking for and is on the network is usually a search or two away.

Meanwhile, Mastodon/ActivityPub is severely fragmented and can require some serious detective work and multiple account registrations to uncover the islands you're looking for. Federation policies and IP filtering might be the most commonly used moderation techniques. The network has a few oases with miles of desert between.

I think the difference in mindset (which is reflected in both UX and default server behavior) where Mastodon (and IMO unfortunately other compatible servers) focus on server-level community capture and branding and Matrix on a globally hyperconnected mesh where servers by default gossip and sync updates to route updates around broken links in the server-to-server connectivity graph.

I see no inherent reason why Gitea couldn't or shouldn't be closer to the Matrix model regardless of which lower-level protocol is used. Maybe it will even inspire other AP implementations with a different scope.

3nprob commented 2 years ago

People can and do choose to join a specific instance of based on the community, not the technology running it - and sometimes they choose not to federate with a remote instance because of the community or content. Have a read of the list of "we will not federate with these people" as a live example: https://mastodon.social/about/more

@tepozoa A large part of my argument above is that it doesn't need to be this way and that it's not in the interest of a healthy network to impose this framing on users or even admins from the application. That this is a cultural thing in the Mastodon-surrounding fediverse which is legacy from how the network was formed and the software was built.

One example on how the software can push towards a stronger graph: Strongly disincentivize allow-list-based federation on the public network ("Either you're internal/private and only federating with others in the consortium, or you're allow-by-default with blocklist for problematic instances" vs "Carefully handpick the servers you're federating with and populate the allowlist").

Another one: Propagate updates from one neighbor instance to other concerned neighbor instances as appropriate (this was talked about above already and this is one way to consider it).

3nprob commented 2 years ago

For those who have not and find this interesting, I think there are key lessons to be learned from the history of IRC (which was globally federated at one point but split due to cultural differences in federation policies. Some similarities with the GAB/Parler drama in AP). Maybe someone else has the link to a great article I read on it that I don't have on hand right now. Wikipedia has some spread over the individual pages of the major networks.

This is a bit of a tangent but related to my point of how what seems like a minor technical detail can have unforeseen large effect on the topology on a federated network which in turn can have huge impact on how users engage with it. And something about how those who don't learn from history are doomed to repeat it.

raucao commented 2 years ago

Regarding the last few comments here, you seem to be misunderstanding Mastodon a bit. By default, nothing is blocked, and there is no allowlist in place at all. The allure of one instance over another is usually to join communities based on shared interests and values, same as in the physical world. This will likely be a bit different for code forges, since most creators of public repos share the basic values of FOSS in some way.

However, the main reason there are moderation features like instance blocks, is for spam and harassment, which is necessary on a federated network, where your local admin is unable to moderate the status of users on remote instances. Not mentioning that shitposting and trolling is just the norm on social networks, but not so much on code forges.

Futhermore, the main reason Matrix is so much more resource-intensive than e.g. XMPP, is that it mirrors so much state across servers, instead of limiting the inter-server traffic to the data exchange that is necessary for transporting only the messages that are actually useful to local users of a server.

If you would want to sync activity for every repo on all federated forges by default, that will easily increase your resource usage 100-fold, for no obvious benefit, both for data storage as well as HTTP requests. Imagine 1 million repos being published on this network, and anyone who wants to activate federation on their Gitea would have to sync metadata and activity for all of them by default. That does not sound sustainable or desirable to me. (GitHub currently hosts 200+ million repos, just to give some perspective. The majority of them likely have less than 10 stars/followers.)

remram44 commented 2 years ago

IRC was never really open. There was a brief period when one hub (eris.berkeley.edu) of the single historical network allowed server connections from everywhere, that was chaos, they stopped within the year. IRC does not have namespacing of usernames nor channel names by server of origin, so open federation means nicks and channels collide, and a rogue server can take over every nick and channel, by design. I don't think there are lessons to be learned there at all.

3nprob commented 2 years ago

@raucao I don't think I'm misunderstanding there. I'm saying that this is how the software is commonly used and deployed today, and the behavior we see, not that the software can't be used differently. The UI/UX of the timelines mentioned above is also part of this.

I'm absolutely not saying Gitea should aim to mimic Matrix either. That kind of state-sync might make sense in chat but I think we all agree it doesn't make sense here (and hasn't been proposed AFAIK?)

It's just that a large portion of comments above me are on the form of "this is of Mastodon does it so that's how gitea could/should/can only do it as well" and these comments were an attempt to simultaneously problematize that, lift the perspective, and highlight the importance of separation of concerns (which was my main motivation and thankfully this part seems uncontroversial so far at least).

3nprob commented 2 years ago

However, the main reason there are moderation features like instance blocks, is for spam and harassment, which is necessary on a federated network, where your local admin is unable to moderate the status of users on remote instances.

My point is rather that ACLs should be separated from connectivity. You can drop messages signed from a host (blocking all content originating it) but still consume messages passed along and signed from other instances. Of course you still need to be able to block IP ranges as part of defense and preventing DoS attacks but moderation (filtering unwanted content) should have purpose-built tools.

Not mentioning that shitposting and trolling is just the norm on social networks, but not so much on code forges.

The GitTorrent author had a different experience (obv key differences with the gitea approach so not everything translates). Spam and other malicious behavior should not be underestimated. For better or worse it will only be a practical problem after some threshold of adoption and notoriety so there's time to address this gradually as the network is live. But it's more a question of "when" than "if". As long as the cost/effort to resolve the spam is larger than it is to generate it, griefers can use this to their benefit.

What I'm pushing for here is a scenario where a user can spin up their individual instance of say 1-to-10-people and start collaborating on repos from various servers all over withoyt having to manually search for or add servers to federation to make that seamless without gaps in threads.

raucao commented 2 years ago

I wrote "not so much", which meant just that. I think it's less of a problem in general with code forges compared to social media, but still always a problem. And I think it needs to be addressed pretty much right from the beginning, and then refined over time.

However, there's a big difference between distributed networks and federated ones, when it comes to spam: the latter requires a server/hostname address, which makes it much easier to block instances that do not moderate their users properly. So it's already less of a problem in this case, as compared to maybe GitTorrent or even Radicle.

3nprob commented 2 years ago

I wrote "not so much", which meant just that. I think it's less of a problem in general with code forges compared to social media, but still always a problem. And I think it needs to be addressed pretty much right from the beginning, and then refined over time.

100% :+1:

However, there's a big difference between distributed networks and federated ones, when it comes to spam: the latter requires a server/hostname address, which makes it much easier to block instances that do not moderate their users properly. So it's already less of a problem in this case, as compared to maybe GitTorrent or even Radicle.

In practice, today, this is correct. This has been bothering me a lot recently, though, and I think it may not be sustainable in the long-term. It relies on a root of trust in CAs as well as ICANN and registrars. At some point that becomes not much different than leaning back and saying "there is no spam; Cloudflare with strict protection takes care of that". Maybe users want to participate over tor, i2p or other overlay networks? Some instances may run multi-stack to bridge. Maybe DIDs play a role here? I completely get that this is a larger and not fully explored problem in general and probably not something that can be expected to be accommodated for properly in initial stages.

Perhaps this is a good time to raise this question: Is censorship-resistance a desired feature? Goal/non-goal? Consider federating in the face of adverse action from the trusted parties (perhaps being forced to due to politics).

How about anonymity? We're already at the point where only the highly motivated or sophisticated can acquire a domain name without leaking their identity or breaking registration terms.

Bit of a rant and tangent; apologies if this is noise but hopefully it's still useful.

ghost commented 2 years ago

As a Gitea federation developer, this is my perspective on the discussion so far:

Discoverability

This is indeed an important problem, but currently I think it's better to prioritize the core ActivityPub implementation before moving on to implementing discoverability features. My experimental Gitea repo does actually support federated search, but it only indexes repos that your instance has interacted with before.

Currently, ForgeFlux is working on federated discoverability with their North Star and Starchart projects. We could also take inspiration from Sepia Search or DHTs.

Spam

Spam is also a huge problem. My Gitea instance (running the experimental code) has already been spammed by a malicious Mastodon instance even though Gitea doesn't officially have federation yet! We will actually be implementing some basic moderation and anti-spam features soon (partially because it's annoying to clean up spam on my Gitea instance).

ewtoombs commented 2 years ago

This morning, I woke up, and thought, "Hei, wouldn't it be cool if gitea were federated?" Then, I find this thread.

Faith in opensource = restored. \o/

ewtoombs commented 2 years ago

Why are you thumbsing that down, @remram44 ?

remram44 commented 2 years ago

Please stop spamming. You could have tweeted that and brought attention to the effort, instead you are triggering notifications for everyone who is following this issue to be aware of progress with contentless comments. Please stop.

ewtoombs commented 2 years ago

This is the first I've ever seen anybody upset about this on github, but I see your point. Sorry about that, @remram44 .

Also, I don't have twitter, but sharing news of this effort with friends on matrix was the first thing I did.

mpeter50 commented 1 year ago

@raucao on this comment of yours, I think you may be misunderstanding how the Matrix protocol does federation.

Futhermore, the main reason Matrix is so much more resource-intensive than e.g. XMPP, is that it mirrors so much state across servers, instead of limiting the inter-server traffic to the data exchange that is necessary for transporting only the messages that are actually useful to local users of a server.

If you would want to sync activity for every repo on all federated forges by default, that will easily increase your resource usage 100-fold,

The second paragraph in the quoted part is true, what you describe in the first one is actually unnecessarily resouce intensive. But I believe that this is not happening. I think that on Matrix, your own homeserver only synchronizes events from rooms to which at least 1 user on your homeserver has joined. An other thing in which I'm less confident, and I'm not sure if this has been a proposal or if Synapse currently works this way, is that your homeserver only synchronizes the events of a room which were requested by one of the users. There are important state events (e.g. membership and permission changes) without which the room state would be incorrect on a synchronizing homeserver so at least the last one always has to be synchronized, but less important ones like messages and uploaded files can be synchronized on demand.

This may seem irrelevant here, as this is the Gitea project, but I just wanted to point out that a federeating Gitea server does not need to unconditionally copy the git repository, issues and everything else from all other Gitea servers, when federating over the Matrix protocol. Even more: if a repository needs to be synchonized, maybe not necessarily all of it at once. As I know, git already supports partial clones and such, with the capability to download more when needed.


@ewtoombs

This is the first I've ever seen anybody upset about this on github, but I see your point. Sorry about that, @remram44 .

Please don't take this as an offense, but from time to time it occurs on larger projects. They are right, there are 15 participants of this issue, possibly even more have subscribed to it (me included). We also are happy that this is in progress, but when we are watching multiple such projects, offtopic and less meaningful comments can become annoying to deal with, and projects accepting/generating them become more time consuming to get up to date with, and in turn interested people will come less to look at the new progress and maybe chip in with their ideas.

sorpaas commented 1 year ago

Since Matrix is mentioned, I'm actually working on exploring a federated git / federated forum solution using the Matrix protocol. What I have now is called morum, a forum (at prototyping stage), and I'm planning to eventually extend it to support PRs as well as a simple git event log, thus allowing it to be used for a federated git solution as well.

The idea is basically as follows, taking advantage of the hierarchical structure of Matrix spaces.

Matrix, for smaller instances, will indeed be more resource-intensive than ActivityPub (for larger instances, it honestly depends, and Matrix might "win" sometimes). But this thing will be much simpler to implement than in ActivityPub, so I think it would be worth a try.

If you would want to sync activity for every repo on all federated forges by default, that will easily increase your resource usage 100-fold,

We actually won't be syncing like that, not even a single full repo / "project". A smaller instance will only sync the particular posts that users in this instance have interacted, so only a small subset of a repo. You can also, if wanted, to use the "guest access" feature of Matrix to only sync rooms that users have commented on. But in this case I think it's overkill and not really needed.

I've also created a Matrix room if you want to discuss more on this or the Morum codebase.

bqv commented 1 year ago

Last post aside, please consider bridging the matrix rooms to irc, for those of us who would rather not use matrix

Giszmo commented 1 year ago

I'd like to draw your attention to a bounty by Jack Dorsey which would be quite in line with Gitea federation support. He did not publish a lot of requirements but he gave a thumbs up to my shot at the issue here.

As I said the bounty isn't exactly buzzing with requirements but you can find it here.

The bounty is in BTC. 10BTC which as of now is worth $226k.

melvincarvalho commented 1 year ago

My feeling is that requirements for the bounty above would be something that ideally bridges git and nostr

There is currently underway a nostr + fediverse bridge

I also have a predicate that can be added to a fediverse profile to link to nostr

It's a slightly different architecture, that of federation vs relays. However, there is no reason imho why they cant play nicely together, and offer users best of all worlds. I suspect this is also true of the fediverse in general. However, getting changes upstream to be prioritized can be a challenge.

I think identity is the thing that glues it all together, so if it's possible to expand gitea profiles a bit, it may solve quite a few different problems.

Giszmo commented 1 year ago

@melvincarvalho while Jack did mention nostr in his bounty, it's not a requirement and I see for example in #18240 and #14186 that gitea might be on a different track there with ActivityPub. Not sure if there is any code yet.

raucao commented 1 year ago

@Giszmo It says "Nostr-based" both in the title and description. How is that not a requirement?

(I agree that federation support in Gitea is the way to go, and it might be possible to add nostr support in a way that doesn't compromise ActivityPub as the main means of decentralization.)

melvincarvalho commented 1 year ago

@raucao doable, IMHO. The predicate I use above using owl InverseFunctionalProperty which is designed for this kind of thing. You would maybe need to add it to the @context and one line in gitea profiles (json + html). This is perhaps worth starting an adjacent issue for.

Giszmo commented 1 year ago

@raucao jack said in his nostr post that's also linked above:

Still believe it’s critical we have a credible permissionless alternative to GutHub (ideally based on nostr). One that bitcoin-core and all nostr devs would trust.

Moving my bounty up from 120 million sats to 1 billion sats.

I admit it's not clear if the bounty is exclusively for a nostr solution. If you think you can fix the issues he wants to see fixed, ask him. I'm pretty sure he would support any solution that fixes the problems he sees in GitHub. Sadly we have to guess a bit which problems that are but I guess discoverability and censorship resistance are the main ones. If you have a serious proposal, send him a DM on nostr. He will probably reply.

PeerRich commented 1 year ago

If you have a serious proposal, send him a DM on nostr. He will probably reply

I think he said he cant read DMs anymore because too many people spam him

PeerRich commented 1 year ago

jack@squareup.com is probably better

pdxjohnny commented 1 year ago

I'd love to help with this, is there anything that's a good first target? Issues?

Giszmo commented 1 year ago

Jack remains very concerned about this issue. If anybody wants to get compensated for working on this, get in touch for example by replying on this thread. https://snort.social/e/note1l2mf4d3c9p4rpwmwmg2xtcf6x3ltpcmxrx6xpyjeryzrvz7jyg5qszu3pt

And the time to at least suggest to soon claim the bounty is now as he's thinking about different approaches.

Jack's bounty currently is worth $290,000.-

lunny commented 1 year ago

@Giszmo Maybe Jack's requirement is not the same as this issue? This issue is based on ActivityPub and ForgeFed. Of course, I think some of the code could be shared between different implementations.

bqv commented 1 year ago

This feels like it has become offtopic for what was meant to be a tracker for the state of implementation of federation

developedsoftware commented 12 months ago

Is gitea federated state abandoned ? Or is this blocked due to external factors? Would love to contribute to this

lunny commented 12 months ago

Maybe you can have a proposal about your idea and design before coding, so we can discuss it.

raucao commented 12 months ago

@developedsoftware You can contribute here, where most of the federation work is currently happening: https://codeberg.org/forgejo/forgejo/issues/59

IGLOU-EU commented 10 months ago

His, is there a bounty or other financial way to help this gitea feature development ?

almereyda commented 10 months ago

You could consult https://codeberg.org/forgejo/sustainability/src/branch/main/README.md for ideas about who to contact to fund this development.