blackjack4494 / yt-dlc

media downloader and library for various sites.
The Unlicense
2.89k stars 363 forks source link

[Question] Consider migration from GitHub #9

Closed FireMasterK closed 3 years ago

FireMasterK commented 3 years ago

Checklist

Question

Have you considered migrating from GitHub to prevent this entire situation from happening again?

A mirror is now available at https://git.kavin.rocks/kavin/yt-dlc Tweet regarding the CEO trying to bring back youtube-dl: https://nitter.kavin.rocks/t3rr4dice/status/1320660235363749888

ssokolow commented 3 years ago

If so, I'd suggest a European-hosted alternative, like the Gitea instance at Codeberg.org.

xloem commented 3 years ago

https://git-annex.branchable.com/ uses a solution where issues etc are stored in the same git repository as the source code.

FormerlyChucks commented 3 years ago

just here to say fuck youtube

Redo11 commented 3 years ago

The fact that this is still here is amazing. I really want this source code to be backed somewhere safe, but be aware that self-hosted options place you at the front of dealing with DMCA. The safest way would be to use something so obscure, that they ain't even bother with claiming it.

ssokolow commented 3 years ago

@Redo11 Aside from Codeberg, here are few more options I've since learned of:

Also, SourceForge is so desperate to stay relevant that they welcome people using SourceForge to set up supplementary services for projects hosted elsewhere, which might help with DMCAs since the code and the non-code hosting would be independent. (For the common case of projects on GitHub, they even have a wizard for creating a SourceForge project to supplement a GitHub repo.)

xloem commented 3 years ago

@ssokolow @Redo11

more things, not in a particular order

Seirdy commented 3 years ago

a simpler alternative would be de-coupling the issue-tracking from the repo. several mirrors could be put in place (github, gitlab, sourcehut, bitbucket, codeberg, notabug.io, repo.or.cz, a bunch of self-hosted instances) so you get a "hydra" effect. Git lets you work with multiple remotes out of the box.

Then the issue becomes figuring out how to keep issue-tracking resilient. The best solution would be mailing lists; since everybody has a copy of everything in their maildir, the mailing list is fully distributed/decentralized just like git. The RIAA is less likely to come for a mail server, and if they do, migration is trivial if you have a maildir or MBOX.

ssokolow commented 3 years ago

@Seirdy Dear God please no. I despise when circumstances force me to join a mailing list... unless you're willing to guarantee the availability of an NNTP bridge.

Without an NNTP interface, I usually consider reporting bugs to mailing list-based developers too onerous and work around them locally while waiting for someone else to solve things.

It's bad enough when a blog/comment/mail-form spammer lands on a message template well-enough formed to get past my "I wouldn't want this from a human either" filters and I don't have time to update them right away.

Seirdy commented 3 years ago

@ssokolow the thing is, most mainstream issue trackers require you to create an account. That in itself is "too onerous" for people who already have an email associated with their git identity. Asking everyone to create a codeberg/gitlab/notabug/sf account probably won't fly.

If you consider emailing a patch too onerous, you could just post a pastebin link to IRC instead. There's no shortage of ways to send a file that don't revolve around putting all your eggs in the same basket that's hosting your repo.

My mailing-list suggestion was predicated on the idea that switching a git remote is extremely trivial, while migrating issues/patches is more tricky. Email is already a proven solution for some of the largest git projects in existence.

If you have another well-tested solution for de-coupling issue tracking from the code forge and distributing it among participants (so everyone gets an offline copy) without requiring contributers to create yet another account, I'm all ears.

ssokolow commented 3 years ago

@Seirdy As long as there are separate mechanisms for contributors and mere bug reporters, and bug reporters don't have to do something like registering a throaway account to keep the mailing list from cluttering up their inbox, I'll be happy with it.

That said, I do remember running across a couple of experiments in retrofitting Git with Fossil-style "bug tracker data stored in the repo" issue trackers. I wonder if any of those are still active.

UPDATE: Maybe git-bug. It's got a WIP web interface that could be extended to meet needs and support for importing and exporting GitHub, Gitlab, and JIRA issues in more than just a "one shot in the beginning" sort of way.

(Heck, extend that to support import/export with Gitea and run a Gitea instance for the public to report bugs at and it'd probably be perfect. Bugs come in on the Gitea and get imported into the repo and pushed out to the clones. If you're worried about the account, mod Gitea to accept something like GitHub OAuth sign-in as an option.)

Failing that, maybe whip up a basic web frontend for git-issue or bug.

UPDATE: I think Bugs Everywhere was one of the ones I originally remembered.

UPDATE: Turns out I blogged about it back in 2011.

Seirdy commented 3 years ago

bug reporters don't have to do something like registering a throway account to keep the mailing list from cluttering up their inbox

Having to create an account somewhere, be it on GitHub or elsewhere, typically requires entering an email address. By default, replies in issues get sent to your email, just like a mailing list. Either way, you're signing up for emails and it's up to you to then control whether or not they clutter up your inbox.

Something like Sourcehut offers a nice web UI for mailing lists to make things easier for people who don't use email.

Learning to use git-bug and building a stable web UI for it sounds like overkill when it's probably easier to just send an email.

Git+email is already decentralized, widely used, and stable. People already know how to email attachments. Being able to use email is a re-usable skill that can then be applied to other mailing-list-based projects, reducing lock-in to companies like MS.

Let's not re-invent the wheel to introduce a new, more complex tool for everyone to learn just to contribute to youtube-dl. If you really want a web UI, Sourcehut already has one that lets you abstract away the mailing list

ssokolow commented 3 years ago

As long as I'm not being forced to reinvent "just one thread and mentions elsewhere, please" filtering and have a simple unsubscribe link/button that doesn't require me to send an UNSUBSCRIBE e-mail that's formatted just right and/or appears to comes from just the right inbound-only alias in my "spam defense via e-mail addresses as revokable API keys" mail system, I'm fine.

xloem commented 3 years ago

UPDATE: I think Bugs Everywhere was one of the ones I originally remembered.

A first glance at this solution appears to make sense here. It's written in python, just like this project, so contributors would be at home. It already has a web interface and an email interface.

I don't know why you'd want to decouple the bugs from the code when the project may get disrupted. Combining them (i.e. storing the issues in the code, not on some service) helps preservation and access.

One concern was learning to use these things. Somebody has to put in the effort of setting it up.

Some other suggestions were for tools that bridge between github, gitlab, jira. These have the advantage of not requiring anyone to run a server to provide easy user access.

BiosPlus commented 3 years ago

I'd recommend notabug.org

Would also recommend looking into setting up a mirror of the primary repo on a self hosted gitea or gitlab instance hidden behind a domain from njal.la

rain-1 commented 3 years ago

a simpler alternative would be de-coupling the issue-tracking from the repo. several mirrors could be put in place (github, gitlab, sourcehut, bitbucket, codeberg, notabug.io, repo.or.cz, a bunch of self-hosted instances) so you get a "hydra" effect. Git lets you work with multiple remotes out of the box.

Then the issue becomes figuring out how to keep issue-tracking resilient. The best solution would be mailing lists; since everybody has a copy of everything in their maildir, the mailing list is fully distributed/decentralized just like git. The RIAA is less likely to come for a mail server, and if they do, migration is trivial if you have a maildir or MBOX.

Seconded, How about https://github.com/MichaelMure/git-bug

vxbinaca commented 3 years ago

Just do Gitee and call it a day, this magic shit about IPFS and decentralized stuff is overkill. The Chinese don't care.

xloem commented 3 years ago

If anyone can actually set any of these up so others can use them, that's what the community really needs here.

jbruchon commented 3 years ago

Removal of the references to downloading copyrighted material is sufficient. The DMCA takedown's only realistic leg to stand on was the test cases showing that the primary intent of this program was to be used to obtain copyrighted material. Clearly, this is a tool that can be used for good or evil, and archivists like myself use it to back up YouTube channels that could be at risk of being lost in the future as YouTube continues its slow shift towards becoming "cable TV, but online." There is no longer any solid footing that a new DMCA takedown notice can stand on. The program doesn't circumvent copyright protection mechanisms (the JavaScript to assemble media stream URLs and the media streams themselves are all sent by YouTube unencrypted without a browse-wrap license agreement and/or mandatory user registration, and no, SSL is transport layer encryption, not DRM) so it now falls into the same category as a tool like HandBrake: it CAN be used to do things that are possibly a violation of copyright, but that is neither its only purpose nor its primary purpose.

If you move from GitHub, you're moving off of the largest open source software hosting platform on the planet. You will lose searchability and a lot of people simply won't register at the other lesser-known sites to contribute to the code base.

I don't think moving is a good idea.

Seirdy commented 3 years ago

If you want to maintain the discoverability and searchability, just make GitHub one of your remotes. It doesn't have to be the primary remote. This goes back to the "Hydra effect" I mentioned earlier.

The ideal solution would be to have a plethora of remotes but one bug-tracker. Just git remote add GitHub on top of whatever else you want to use.

I'm not in favor of bundling the bug-tracker with the repo; getting the code should not require getting all the issues and patches, and simple tasks like responding to an issue with instructions to reproduce should not require users to create an account, clone the repo and use git; that would impose a more significant barrier to entry than sending an email.

I feel like a broken record when I say that git is already distributed, federated, and decentralized. A plethora of projects much more well-known than youtube-dl leverage this. We shouldn't require people to learn a brand-new tool and make an account just to contribute to youtube-dl when we can be a good FOSS citizen and encourage people to use a process that can be re-used in other projects.

Much of youtube-dl's functionality leverages ffmpeg, which also uses mailing lists and a GitHub mirror.

322997am commented 3 years ago

I believe the best way is to self-host on a VPS service in a country that ignores the DMCA(Russia for example). The DMCA is a bad law that is probably unconstitutional, but stuff like these takedowns will continue happening as long as Disney and the like exist. I believe that Gitea is feature rich and can be self-hosted. I am willing to help with any translation needed if this is ever decided on, as I am fluent in both Russian and English.

ssokolow commented 3 years ago

@322997am ...but do mirror everything somewhere else or your bus factor is one.

xloem commented 3 years ago

If somebody sets up a mirror, can anybody volunteer to co-maintain or co-administer it?

FireMasterK commented 3 years ago

https://git.kavin.rocks/kavin/yt-dlc

Here's a mirror on my personal git server.

FireMasterK commented 3 years ago

A relevant tweet to this topic: https://twitter.com/t3rr4dice/status/1320660235363749888

Seirdy commented 3 years ago

Just thought I’d share my approach to “hydra hosting” since people here seem interested:

I mirror my repos across Sourcehut, Gitlab, and GitHub. Here’s the relevant snippet of my .git/config of my dotfiles repo:

[remote "origin"]
    url = git@git.sr.ht:~seirdy/dotfiles
    fetch = +refs/heads/*:refs/remotes/origin/*
[remote "gl_mirror"]
    url = git@gitlab.com:Seirdy/dotfiles.git
    fetch = +refs/heads/*:refs/remotes/gl_mirror/*
[remote "gh_mirror"]
    url = git@github.com:Seirdy/dotfiles.git
    fetch = +refs/heads/*:refs/remotes/gh_mirror/*

Pushing to three remotes, one after the other, can be slow. To speed things up, I created an alias to push to all remotes in parallel in my gitconfig:

[alias]
    pushall = !git remote | grep -E 'origin|mirror' | xargs -L1 -P 0 git push --all --follow-tags

This pushes to all remotes that have "origin" or "mirror" in their names, and skips the rest. Now, I can pull from all repos I'm following and push to the ones I have access to.

Core developers can post a list of a few upstream remotes that only they can push to. Community members can set up extra remotes for resiliency and for forking/personal development.

This raises an issue: when there are multiple git remotes, where do people file tickets or submit patches?

The best solution is to have one canonical place to do issue tracking, separate from git remotes. I've previously explained why I think a mailing list (with a Sourcehut-style frontend for those not used to mailing lists) would be the best option for this, but a plethora of other solutions exist as well, from GitHub issue trackers to Bugzilla.

Edit: also, it's a good idea to advertise the remotes in the project README. Example.

FireMasterK commented 3 years ago

Why not have a cronjob to do the same? This way even actions (such as merging a pr) on github could be added and the developers don't need to change their work flow / modify their git configs when pushing commits

ssokolow commented 3 years ago

@FireMasterK What I've been meaning to set up for my own projects is:

Seirdy commented 3 years ago

@FireMasterK The whole point of this approach is not to depend on one service, be it a git remote or a CI/CD provider, especially a proprietary one like GitHub Actions. These features have been baked into git and successfully used for a long time.

edit: neutralized a rogue comma

ddevault commented 3 years ago

SourceHut admin here. I wrote this up today, is relevant:

https://sourcehut.org/blog/2020-10-29-how-mailing-lists-prevent-censorship/

ssokolow commented 3 years ago

@ddevault I fully agree, ideologically... I'm just already too overwhelmed with other things in my life to justify a UI/UX regression in the name of principles, and I doubt I'm the only one.

That's why I'm so fixated on "If you're going to use a mailing list, set up an NNTP bridge."

ddevault commented 3 years ago

Many people, myself included, do not believe sourcehut is a UI/UX regression. I hear positive feedback on the UI all the time, and by objective measures, it is the fastest and most accessible forge.

ssokolow commented 3 years ago

@ddevault Does it provide features like:

If so, then it's not what I think when I hear "mailing list" but more "forum software with best-in-class e-mail integration".

ddevault commented 3 years ago

Receiving e-mails only for specific threads

No, but this is planned and prioritized.

Receiving e-mail notification of someone using an @mention?

Not on the mailing lists. Just Cc them. The bugtracker does support ~mentions though. Edit: the bugtracker also supports subscribing to individual threads.

Subscription/unsubscription control via HTTP, rather than an unreliably particular e-mail interface involving sending certain messages from certain addresses?

Yes.

ssokolow commented 3 years ago

No, but this is planned and prioritized.

Yes.

Those are my two biggest issues taken care of then.

Not on the mailing lists. Just Cc them.

Generally, as an anti-spam measure, I try to configure my mail system so that any messages not coming from an authorized sender (in this case, the listserv address) will get redirected to trash without me seeing them.

(When I have time to implement a custom milter, it'll be fully automatic, where outbound mail to someone generates a custom From: address, a browser extension lets me generate one for a form from the context menu, and any attempt by... say... an eBay seller to subscribe my PayPal address to their newsletter will just result in "Bounced: You tried to send a message to my PayPal address but you are not PayPal.")

ddevault commented 3 years ago

Generally, as an anti-spam measure, I try to configure my mail system so that any messages not coming from an authorized sender (in this case, the listserv address) will get redirected to trash without me seeing them.

Our mailing lists (unlike some others, I know, I know, I hate it too) will not break DKIM signatures, which is a better measure of message authenticity than checking the sender.

ssokolow commented 3 years ago

DKIM doesn't prevent a "legitimate" sender from sending unsolicited e-mail to an address that was shared with them without my permission.

Thanks to my manual use of tools like SpamGourmet, all the spam I can remember getting in the last decade has been something where things like SPF and DKIM are set up correctly... the sender just isn't the person I gave that alias to.

It gets annoying having to change things like my "from eBay" address just because eBay leaked my e-mail to a Chinese seller I bought something from once, and China doesn't have the kind of strong penalties for unsolicited commercial mail that we have here in Canada.

Treating e-mail addresses as revokable API keys took me a long way, but I want to get to the point where I can never receive a piece of spam again, unless it's through an intentionally open address.

ddevault commented 3 years ago

Thanks to my manual use of tools like SpamGourmet, all the spam I can remember getting in the last decade has been something where things like SPF and DKIM are set up correctly... the sender just isn't the person I gave that alias to.

You could also subscribe to the mailing list using an alias address and check the rcpt to.

ssokolow commented 3 years ago

You could also subscribe to the mailing list using an alias address

That's a given. My inbox is configured to trash any message that didn't arrive via an alias. (To ensure the revokability of whatever path the messages arrive via.)

and check the rcpt to.

I try to avoid implementing custom fixes on a per-site basis.

ddevault commented 3 years ago

shrug

This is how mailing lists work. Your setup is exceptional, not the norm, and at some point the chickens come home to roost. These emails are legitimate and we've taken many steps to make sure that they are authenticated as such.

ssokolow commented 3 years ago

Hey, as long as people don't expect to be able to CC me without first requesting to be added to my whitelist and have the message arrive, I'm fine with status quo on that point.

(It's like telemarketing. One reason out of about half a dozen that I don't have a smartphone is that it would preclude my plans to set up a FreeSWITCH PBX with some custom voice menus to serve a CAPTCHA-like role for telemarketers.)

xloem commented 3 years ago

If bug discussion threads were inside git repositories, we could add a commit to move the long conversation between 2 people away from the main thread.

FrickTheRIAA commented 3 years ago

I have put together a guide on how to host yt-dlc offshore using companies that have a track record of being very resilient to DMCA takedowns: https://old.reddit.com/r/youtubedl/comments/jktx5b/how_to_anonymously_host_the_continued_development/ I won't copy paste the whole guide into this post since 1) it might be better to just link to it off-site and have further away from the development of the software and 2) it's pretty long.

xloem commented 3 years ago

https://git.kavin.rocks/kavin/yt-dlc

Here's a mirror on my personal git server.

@FireMasterK are you available at all to run a cron job or service on your host that would run something like git-bug?

BiosPlus commented 3 years ago

imo this should be marked as closed.

xloem commented 3 years ago
  • This repo is already being backed up by the maintainer in a separate self-hosted location.

obviously that doesn't address the bus-factor concern, but do you have a reference to where to find that mirror please?

BiosPlus commented 3 years ago

obviously that doesn't address the bus-factor concern, but do you have a reference to where to find that mirror please?

It's being kept private from what I can see, the maintainer is on a discord that we share and has posted it. I'll leave it to them to choice whether to divulge the url

shakeyourbunny commented 3 years ago

https://git.kavin.rocks/kavin/yt-dlc Here's a mirror on my personal git server. @FireMasterK are you available at all to run a cron job or service on your host that would run something like git-bug?

This site uses gitea, which has an inbuilt feature to mirror other repos.

xloem commented 3 years ago

@shakeyourbunny do you know whether gitea mirrors issues like git-bug does, and pull requests? edit: the answer is no, git-bug and gitea mirroring solve differing problems. https://github.com/go-gitea/gitea/issues/1876

Nekun commented 3 years ago

@BiosPlus

The current CEO of GH is working with the YTDL team to have the original repo reinstated (he's present on their IRC).

https://nitter.net/t3rr4dice/status/1320660235363749888 As you can see on screenshot, one of conditions for return youtube-dl repository on GitHub is deleting code to bypass YouTube DRM, so if youtube-dl team will make decision to stay on GH, functionality will be strongly reduced, you'll can't download or watch in mpv much of music videos anymore.

xloem commented 3 years ago

there also doesn't appear to be a discord channel. there's a gitter bridged to matrix.