IPFS to store videos - Githubissues

alxlg commented 6 years ago

I think that what limits PeerTube adoption is that instances are perfect for personal/organization use but not to build a free service like YouTube where everyone can upload videos without limits. The issue is that storage has a cost and videos make the necessary storage grow quickly.

IPFS (InterPlanetary File System) can be used to solve the storage issue because every user can store the files by himself but it doesn't have a way to browse and interact with them. PeerTube instead has an awesome UI that can be used by everyone.

Would it be possible to combine PeerTube and IPFS? Ideally the instance administrator would limit the classic upload for each user but eventually let users upload videos by specify an IPFS address. I guess when a second user browse a PeerTube instance and want to watch a video hosted on IPFS, PeerTube provides it by reading from IPFS and not from its local storage. PeerTube instances would cache IPFS contents like IPFS users and admins would monitor IPFS cache impact on their storage. If a PeerTube user wants to be sure its video is available he just have to keep it on IPFS with his machine. This could have another advantage: if the used PeerTube instance won't be available anymore its users won't need to upload videos on other PeerTube instances if they are on IPFS: they would just "upload" the IPFS addresses.

I will be grateful to those who answer by denying or confirming my assumptions.

alxlg commented 3 years ago

This is a bit off topic but let me say that the Internet is not anonymous on a lower level of the stack and there is no way in general to share data publicly on a P2P network and don't make other nodes aware of what other data you are sharing. When we say PeerTube is privacy-friendly we mean privacy, not being anonymous. This means you are not tracked across the Web and nobody is using your browsing history on PeerTube to profile you. This is about the relation between the user and the tech giants providing Web services.

The problem of being anonymous is totally about how the Internet infrastructure is managed in the nation you live in. If you don't trust private ISPs you should politically pretend a public Internet service. If you don't trust your governament, you need to change it. No technology is going to fix that.

ghost commented 3 years ago

If you don't trust your governament, you need to change it.

This is true, but I'll also add that this is the whole point of technical privacy systems like Tor.

georgyo commented 3 years ago

This whole thread seems to be missing a demo, so here is one, it is complete with P2P, loads extremely quickly, and automatically switches between 5 different bit rates:

https://bafybeiazt45dboknwnwwrekot7eenfr62sr6vmxhrwobr4p3cymfmorx5y.ipfs.dweb.link/

You can get the peers by running this in the console:

for await (const peer of await node.swarm.peers()) { console.log(peer.addr.toString()) }

nukelr commented 3 years ago

Actually there's even another distributed solution which is SIA Skynet. If you look as this example: https://siasky.net/EAC6AsZovYp4aIN-FLj1mFEi43WSrGtF7IBZU1T8BzCGfg it loads faster than 90% of Peertube instances....(the video in the link is the documentary "The Internet's Own Boy: The Story of Aaron Swartz" licensed under Creative Commons "Attribution-NonCommercial-ShareAlike 4.0 International" ( http://creativecommons.org/licenses/by-nc-sa/4.0/ ) )

georgyo commented 3 years ago

There a lot of distributed solutions popping up right now, however only IPFS is not tied to a block chain. You could argue filecoin, but IPFS does not depend on filecoin in any way. This gives value to IPFS for all sorts of applications,

Sia requires the use a token to download or upload files. Sia's slash page hides this, and the link you sent is going though their public gateway that they are paying for. 100% of the traffic is going though their webservers. These web portals will need to make money some how eventually as bandwidth isn't free. Similar to Sia, the people who own the bittorrent trademark released btfs. However it is also just a mechanism to give value to some coin. It is impossible to use these networks directly without jumping though many hoops.

IPFS, like bittorrent (the protocol), is much more true P2P. The peertube instances could run IPFS nodes without having to figure out how to get coins in and out of it.

Tying PeerTube to a storage token, or any specific coin for that matter, would be a bad move IMHO.

nukelr commented 3 years ago

It's not about tying PeerTube to anything, I think that having more options to choose between is always a good idea. You want to use IPFS? Amazon AWS? SIA? should be your choice cuz at the end someone has to pay for bandwith and/or storage so i don't see the point.

alex9099 commented 3 years ago

Perhaps i'm missing something obivous, but if the uploader goes offline for some reason wouldn't the video be inaccessible (sorry if this was already replied)?

delbonis commented 3 years ago

should be your choice cuz at the end someone has to pay for bandwith and/or storage so i don't see the point.

@nukelr Yeah the instance operator should be able to configure storage backends however they want.

Perhaps i'm missing something obivous, but if the uploader goes offline for some reason wouldn't the video be inaccessible (sorry if this was already replied)?

@alex9099 If PeerTube blindly ripped out WebTorrents and replaced it with just the browser js IPFS distribution and then didn't host it locally, maybe. But that's a ridiculous idea.

manalejandro commented 3 years ago

Perhaps i'm missing something obivous, but if the uploader goes offline for some reason wouldn't the video be inaccessible (sorry if this was already replied)?

that is relative, anyway if you remove the webtorrent it will not be available only in the peers, if the ipfs content is pinned it will federate between other instances it will always be available somewhere on the p2p network

trymeouteh commented 3 years ago

Would be great for a peertube server to be able to talk with IPFS and allow anyone with a peertube account on the clear web to watch and connect to a video from IPFS or an unstoppable domain site.

manalejandro commented 3 years ago

Being the peertube server, each instance would run a different IPFS repository where the pinned videos would be stored, but the storage is local so the videos would have to be read from the IPFS repository, once the hash of the video blocks is shared between the instances are shared and the content is replicated. I have seen how the current storage of peertube is programmed, it is not modular so it would have to be adapted in case a plugin was used. Also comment on the option of using IPNS as DNS resolution of the instances. You would have two P2P networks running on the clients (webtorrent) and another one between instances (IPFS). Yes it would be unstoppable.

Openmedianetwork commented 3 years ago

Good to get an update on mediating the storage issue with p2p solution on peertube? Are the any paths to this, is it on the agender?

kotovalexarian commented 3 years ago

The fact is that we use WebTorrent (BitTorrent/WebRTC) on the client side to watch videos. It provides a handy pool of content seeders and direct browser connection. Watching a video via IPFS would mean to replace entirely that component with an IPFS client in the browser. So it's not just thinking of a different storage/upload mechanism.

I think they can both exist in parallel.

manalejandro commented 3 years ago

I got it IPNS in the player :thinking:

https://manalejandro.com/~/Art%C3%ADculos/reproductor-p2p-con-enlace-ipns

We could choose IPFS or P2P player loader :wink: unstoppable

ghost commented 3 years ago

It's hardly "unstoppable" if it's strictly less reliable than a single server and a single DNS entry. If you really want to avoid "censorship" (whatever that means to you) you actually need multiple reliable paths to serve the content. My experience with ipfs is that it is strictly less reliable than the network you're overlaying it on top of.

For what it's worth I'm not too optimistic about IPFS even after all the time I spent experimenting with video distribution. Ultimately, IPFS doesn't have a great content routing mechanism and it's extremely memory intensive, because it's not magic and all content-addressed storage is extremely memory intensive. Folks get optimistic about IPFS when it works well in small-scale tests, but that's mostly because in small tests you'll be able to keep enough open connections to just gossip your entire working set between hosts that already know about the data you are requesting.

None of it is private or efficient or hard to stop. I'd love to find a more efficient and reliable way to cluster storage together from multiple non-trusted hosts, but I've been thinking there does need to be at least a minimal trust relationship built around quality of service. I like the idea of peertube servers getting a "buddy" or two, with each server in each group capable of providing failover for the others.

manalejandro commented 3 years ago

It's hardly "unstoppable" if it's strictly less reliable than a single server and a single DNS entry. If you really want to avoid "censorship" (whatever that means to you) you actually need multiple reliable paths to serve the content. My experience with ipfs is that it is strictly less reliable than the network you're overlaying it on top of.

For what it's worth I'm not too optimistic about IPFS even after all the time I spent experimenting with video distribution. Ultimately, IPFS doesn't have a great content routing mechanism and it's extremely memory intensive, because it's not magic and all content-addressed storage is extremely memory intensive. Folks get optimistic about IPFS when it works well in small-scale tests, but that's mostly because in small tests you'll be able to keep enough open connections to just gossip your entire working set between hosts that already know about the data you are requesting.

None of it is private or efficient or hard to stop. I'd love to find a more efficient and reliable way to cluster storage together from multiple non-trusted hosts, but I've been thinking there does need to be at least a minimal trust relationship built around quality of service. I like the idea of peertube servers getting a "buddy" or two, with each server in each group capable of providing failover for the others.

Only if you use DNS with IPNS it is not necessary :smiley: i mean, we have the player that works at half power and can give a lot more to choose from :play_or_pause_button: IPFS has more advantages than disadvantages.

georgyo commented 3 years ago

I think IPFS is really cool, but I am not so sure it is ready for this use case, at least with native js-ipfs. It is simply not fast or efficient enough.

Consider your demo (or my demo at https://bbb.fu.io), it take multiple seconds before the video even starts playing. My demo has the video seeded on dozen of ipfs nodes and the javascript is bootstrapped with an ipfs node that has the video on it; and it is still the case that it takes multiple seconds to load. Significantly longer on lower power devices, such as a mobile device.

My demo has multiple bit rates, and the javascript can not load fast enough to play 1080p without pausing to buffer every few seconds. The loader is smart enough to choose a lower bitrate so a user might not notice. But if the user chooses higher bitrate, they will not have a good time.

IPFS is getting there, but it is not ready for browser video playback just yet.

manalejandro commented 3 years ago

I think IPFS is really cool, but I am not so sure it is ready for this use case, at least with native js-ipfs. It is simply not fast or efficient enough.

Consider your demo (or my demo at https://bbb.fu.io), it take multiple seconds before the video even starts playing. My demo has the video seeded on dozen of ipfs nodes and the javascript is bootstrapped with an ipfs node that has the video on it; and it is still the case that it takes multiple seconds to load. Significantly longer on lower power devices, such as a mobile device.

My demo has multiple bit rates, and the javascript can not load fast enough to play 1080p without pausing to buffer every few seconds. The loader is smart enough to choose a lower bitrate so a user might not notice. But if the user chooses higher bitrate, they will not have a good time.

IPFS is getting there, but it is not ready for browser video playback just yet.

It takes a while to connect and download the data but it works fine on my device :+1:

screenshot-android

minecraftchest1 commented 2 years ago

A use case I am thinking is for users to be able to imp import videos from the ipfs network. I don't think ipfs should be used for sending video to the client, but it might work well for server side storage. Expessially if the peertube server caches the video.

manalejandro commented 1 year ago

A use case I am thinking is for users to be able to imp import videos from the ipfs network. I don't think ipfs should be used for sending video to the client, but it might work well for server side storage. Expessially if the peertube server caches the video.

https://ffmpeg.org/ffmpeg-protocols.html#ipfs

ROBERT-MCDOWELL commented 1 year ago

Another idea would be to use IPLD through IPNS, as this your IPFS settings will be from your DNS server so you won't have to code anything since PT won't see any URL changes. I talk about if you have your own IPFS cluster of course.

hollownights commented 1 year ago

[I’ve made some corrections since first posting this]

I’ve read through the entire issue and I think I’ve come up with a solution for this IPFS usage debacle. Here it is:

1) Each instance should (well, “could” is a better term) host its own local IPFS node and set up a public IPFS gateway to the node; 2) Keep the video uploading to an instance as it is (no need for IPFS at this point); 3) Once a video is uploaded and converted into different files of different qualities, import each of these files (but don’t duplicate them) to a folder on the local IPFS node, and, by doing so, get a public IPFS gateway URL to each file; 4) Make the IPNS name(s) of the folder(s) known to everyone (these are the “big folders”, not just a folder for different files for one video), much like is already being done for Library Genesis and Z-Library (see more links below), and tell people: “Go and host these videos on your own IPFS node”. Doing so will create a new kind of redundancy for instances which doesn’t rely on maintaining a PeerTube instance or even knowing anything about servers and command line; 5) Create a torrent for each video file and then use the public IPFS gateway URL to each file as a webseed; 6) By properly handling the URLs used as webseeds, a user who has the IPFS desktop software installed will also be able to get the video from the IPFS network and not just through WebTorrent (the companion extension isn’t needed for this). If they don’t have it installed, no problem, everything will continue as usual; 7) If P2P Media Loader is used instead of WebTorrent, it should first try to get the file from the IPFS node. If that doesn’t workout, then go get the file from the “normal” storage; 8) The IPFS URL to the file could also be injected in the src attribute of the video HTML element. For this step to really work users would need the IPFS companion extension, though, otherwise the instance would be relying solely on the public gateway.

In short: each instance should host a local IPFS node, use IPFS as webseeds and/or try to load videos first from the IPFS and make it really easy for people to import the instance’s videos into their own IPFS nodes.

Plus: Look at IPFS Free Library and Pirate Library Mirror’s efforts and at all the work needed to get files from a BitTorrent client to a local IPFS node (without duplicating files or the need to mess with command line) and help me get BitTorrent clients to make this process really easy. Links below.

IPFS Free Library

Putting 5,998,794 books on IPFS

Help seed Z-Library on IPFS

Make WebTorrent more IPFS-friendly by better handling IPFS URLs used as webseeds

Make Transmission more IPFS-friendly by better handling IPFS URLs used as webseeds

Transmission: After a torrent has finished downloading, make possible to (automatically) import its file/folder to a local IPFS node

Make qBittorrent more IPFS-friendly by better handling IPFS URLs used as webseeds

qBitTorrent: After a torrent has finished downloading, make possible to (automatically) import its file/folder to a local IPFS node

Make Frostwire more IPFS-friendly by better handling IPFS URLs used as webseeds

Frostwire: After a torrent has finished downloading, make possible to (automatically) import its file/folder to a local IPFS node

Make BiglyBT more IPFS-friendly by better handling IPFS URLs used as webseeds

BiglyBT: After a torrent has finished downloading, make possible to (automatically) import its file/folder to a local IPFS node

hollownights commented 1 year ago

@Chocobozzz Do you think it could work as I propose?

xundeenergie commented 1 year ago

I thougt also about IPFS and fediverse.

But more for images.

Now every image must be spread to each instance, where users should get the image. A big amount if redundant space all over the fediverse instances...

If image-uploads go to an ipfs-storage, clients can retrieve them from ipfs, not from the own instance only...

If you find a way, to implemrnt it to this service, would you spend some thoughts on other fedi-services too?

ROBERT-MCDOWELL commented 1 year ago

maybe add option section for ipfs starting with the ipfs gateway selection. then I think this is more a server owner to manage than PeerTube project.

ghost commented 1 year ago

It would be great if some of the IPFS fans here could take some more time to understand the overheads and the limitations in the protocol.

There's nothing magical or even good about IPFS. It's an overlay network that makes your average content take longer than https would in the worst case. It doesn't store anything for free, and it doesn't store anything automatically. The caching that does happen is fairly ad-hoc and doesn't relate to the actual usage patterns of the content.

If you took the PeerTube as it currently exists and simply bolted IPFS onto it, memory and network usage would increase dramatically, and there would be no immediate benefit to users. It would be theoretically possible for other users to pin and re-host content that's addressable over IPFS, but in practice that content would always travel through an existing peertube server as a gateway. Each server would need to keep a local copy of content that must be delivered reliably/quickly, like video segments or local video thumbnails.

I can see maybe replacing the existing mechanisms for caching remote server thumbnails, but I really doubt it would be more efficient. There's a really astounding amount of memory overhead for indexing a fully content-addressable storage of any type, and the network protocol relies on a huge amount of spurious communication between nodes. It's not going to realistically allow third parties to contribute hosting resources.

ghost commented 1 year ago

Here's a useful subproblem to think about if someone wants it:

To make any realistic use of P2P for primary data storage (i.e. having an IPFS gateway which doesn't have local content pinned locally) the other layers of the stack will need to account for content taking a highly variable amount of time to retrieve.

This piece would be useful for a lot of different kinds of storage! Even in my current PeerTube setup, without object storage or anything, some content will take a different amount of time to retrieve because of block-level caching. It would help every PeerTube server run smoother, but it would open the door to truly latency-variable storages, where some items would be local and some would take an unknown amount of time to retrieve. This would require changes to the backend and UI software as well as probably the UI interaction itself.

ROBERT-MCDOWELL commented 1 year ago

@scanline there is one point about IPFS logic today I'm not sure is working like you say. what I know about IPFS is as long as the local chunk exist (the original chunk from the original server) the replication will exist. if it does not, after a while it disappear from the network unless some servers configured IPFS as persistent, creating as result huge latency or timeout.

ghost commented 1 year ago

unless some servers configured IPFS as persistent

Not sure i understand your point. I'll try to respond anyway?

You can host data permanently if you want, in IPFS terminology this is "pinning". All this does is, as you suggest, gets the data and prevents the data expiring from a cache that it would otherwise expire from.

IPFS relies on gossiping the state of this local cache to nodes that you have a direct (tcp/udp) connection with. If one server happens to still hold onto a chunk because it was fetched recently, its neighbors will find out about this chunk. If it stays cached for long enough, unconnected servers could learn about it through the DHT.

This pinned-ness is a flag that has to be set per chunk. (Chunks are MAX about a megabyte, usually half that.) It can be applied recursively to all the chunks necessary for a file or directory, via IPLD links, but the server still fundamentally is just gossiping about which specific <1MB pieces of data are in the cache.

This is the only replication you get: either someone explicitly chooses to host the file by pinning it, or you get lucky and someone still has a piece cached. That latter strategy can make IPFS appear pretty fast when you're dealing with data that isn't too large or too uncommon or too deeply nested. Every time you have to discover a new layer of blocks that you need but don't have, it takes considerable time to broadcast that request to neighbors, search for it in the DHT, make new connections, gossip block lists, and eventually find someone who has a copy of that data.

manalejandro commented 1 year ago

Hello, to clarify the matter a bit, each instance of PeerTube would have its own IPFS repository, the repositories share the information with each other, but i can have a repository that does not accept sharing only the content that i have locally, in that case it could be decide which instances to share my pinned blocks with the others. I was looking at the current implementation and it is not modular, so the PeerTube storage layer would have to be modularized first to integrate this P2P storage.

ROBERT-MCDOWELL commented 1 year ago

@manalejandro your suggestion seems to be interesting indeed. why not build a plugin and when it's reaching maturity why not to integrate it in peertube core.... It would be interesting to study how archive.org uses IPFS......

ShadowJonathan commented 1 year ago

Peertube doesn't even need to host its own IPFS store, it can just start streaming directly from ipfs.io or dweb.link (both ran by protocol labs) to the browser.

For extra authenticity / trustlessness, peertube can download the video in car format (like .tar but for IPFS content, and self-verifying), and reconstruct it locally to verify its authenticity, or do that to speed up downloads / archive the video via traditional storage means.

In the end, running an active IPFS node, or even caching it locally via IPFS, is only needed if the peertube instance is going to fetch remote content over the network, and not via gateways.

The only problem is that this would make the job of pinning the data nebulous, as the video is just a reference to a CID (IPFS file hash/reference, "Content IDentifier"), and the backing data to that CID can be either alive or not.

For this, pinning outsourcing can be done via the Pinning Services API a REST Spec that some commercial pinning services follow, but more interestingly, ipfs-cluster - a self-hostable pinning 'orchestrator' - also uses. A peertube instance can be paired up with a ipfs-cluster (running an IPFS node), and pin locally-saved videos to this cluster (or those other pinning services) via this API.

All in all, there are plenty of options to make this more speedy.

What I will not recommend is defaulting to adding an IPFS node to every peertube instance, this might work if it is put into dhtclient mode (or in other words: it does not participate in the DHT exchange comprehensively, and doesn't announce it's presence to the network, as exchanging and refreshing DHT records is a major idle load), so that the peertube node can at least attempt to find a video on the network without relying on public gateways, but this requires investigation.

What I will also warn against is putting the ipfs client directly in the browser. IPFS runs best when it's been warming up for a while, as then it has a healthy cache of known closest-distance peers on the network, and it can resolve instances relatively quickly. A cold-started IPFS client on a video page will not be able to serve content (quickly) for at least half a minute or so, even more if the content is behind NAT, and finding addresses for that peer is tricky.

A service worker could work, maybe. If one is connected to the Peertube instance's "home ipfs node" by default, and/or other "known peers that host the content", that federated peertubes could hint at, retrieval could start immediately. Though this would simply turn the IPFS node into a 'workhorse' as a fancy CDN, giving it non-distributed load scaling.

Finally, Kubo (aka go-ipfs) isn't the only implementation out there, recently i found out about iroh, a WIP rust-powered IPFS client, which could maybe illustrate that not everything is about the golang impl.

All-in-all, here's the summary of IPFS-in-peertube as I see it;

Pros:

Disconnect of data and addressing with CIDs
- Videos can be simple links to CIDs, or can be the CIDs
- These CIDs and their content will be immutable, and self-verifying
Hosting can be done else-where, or can even be done by goodwill of a community.
- (as long as one node has the content and is reproviding it to the network, it has a high chance of being found and retrieved)
Less load on the peertube instance, especially in a distributed setting
- Assuming that other nodes grab the data from public gateways or with their own IPFS nodes, they can also re-provide (as long as it is cached and reproviding is turned on for cached content)
- Public gateways at least have a level of caching, lessening load and need to re-fetch content constantly
- Users who pinned the content may also be able to re-provide
- Load may be avoided from the peertube instance entirely if the client loads directly from a public gateway

Cons:

Unpredictable loading times
- DHT search can be slow
- Content can be missing, never completing a search
- Providers may be offline, or unaccessible
- Public gateways time out, as the content does not meet their need of relative spread and knowledge of that content (i.e. content has been reprovided a bunch of times, and is well-known in the DHT, this isn't the case for hours after the video is first "seen" by the network, especially if it's a big file.)
car format cannot be downloaded in chunks
- I found nowhere that it is possible to do this, and i read that car data is given opportunistically in a "first seen in strict dag order". The CARv1 specification also notes that there needs to be an additional spec for determinism, basically waiving this problem.
- This means that a car file needs to be downloaded whole, in one HTTP session, or otherwise restarted
- This makes it relatively unsuitable for client-side reconstruction of the IPFS file DAG
  - (directed a-cyclical graph, basically a self-verifying set of blocks in a strict order and relationship, chunks of data that're all verifiable by one hash, which is inside the CID)
  - This means that the client needs to download a direct video file from somewhere, and trust that they're sending the right data according to the CID
Some gateways explicitly forbid video streaming and downloading
- Cloudflare's IPFS gateway is one of these, I don't know if they've reversed their policy or if there are any others, but this is worth noting
IPFS (at least in a server profile) has high overhead and high idle load
- This has already been covered above, but yes, Kubo constantly connects and gets connected by other peers exchanging DHT records, constantly reproviding them. Needing to do a full cycle every 12h.
- It is worth noting that with Kubo v0.18.0 this period is increases to 22h, applying less pressure on the network
Managing pins can be a hassle.
- Running and maintaining an ipfs-cluster service is extra system administration.
- Buying a subscription at IPFS pinning services, to essentially upload data and keep it uploaded. This might not be very attractive as many of these services are branding themselves around cryptocurrencies and related.
Data can simply be gone one day.
- This can be circumvented by having a .car backup of the data locally, and reviving the data somewhere on-demand (as some/most pinning services, and kubo itself, allows uploading .car directly), though this would be a last-resort case.
- Additionally this still incurs the cost of having the data somewhere in the first place, which adds cost.
- In lieu of having it archived, this means the video will essentially rot after a long while if nobody is interested in pinning it and keeping the records reprovided to the network, which can be the case, as running an IPFS node isn't cheap, and it doesn't get any cheaper the more data you want to store on it, the longer.

Essentially, IPFS is a loud, hyperactive, social butterfly cousin of torrent, where everyone can download anything they want with a magical key of that data, but that that data difficult to retrieve, 'inconvenient' to cache, or simply nonexistent, outside of people-with-that-key's control.

The biggest argument CIDs have (imo) is Portability and Self-verification, that anyone and everyone can take the hash and 'just download' the content, wherever it exists. Else it doesn't have much going for it.

No, I am not going to talk about Filecoin.

ghost commented 1 year ago

Peertube doesn't even need to host its own IPFS store, it can just start streaming directly from ipfs.io or dweb.link (both ran by protocol labs) to the browser.

This is a terrible idea, we have a decentralized network now even if it's mostly just https. And you'd suggest that folks move to funneling all traffic through Protocol Labs in order to switch to a more decentralization-flavored protocol?

For IPFS to be an improvement at all, it would need to retain our ability to self-host video. Relying on an external HTTP gateway is really a no-go, and the gateways are a major limitation in deploying IPFS for general browser use. As others have noted, there's more opportunity for practical use as a server-to-server synchronization mechanism than for delivering client video. But as you and I have both noted, IPFS is quite resource intensive on the server.

The IPFS + PeerTube experiments I did some time ago were focusing on the case where you'd bundle an IPFS server (WebRTC gateway) on the same machine as PeerTube, so browsers could get a quick and reliable connection to the content while still allowing IPFS to serve files from elsewhere once we can warm up the connection. I put that project on hold though because of the high cost in latency and RAM.

ShadowJonathan commented 1 year ago

And you'd suggest that folks move to funneling all traffic through Protocol Labs in order to switch to a more decentralization-flavored protocol?

Please read the rest of my comment, that's just one of the options I suggested.

I forgot to enumerate it, but here's roughly the options as i see it:

Download directly from gateways, like i said above
- Puts a lot of trust in the gateway to get the correct file.
Download from gateways, but via .car formats
- This'd allow the gateways to not need to be trusted, buuuut...
- ...the content can be slow to load, or time out in loading, in which case the HTTP request needs to be started over from the beginning.
- And the client needs to reconstruct the proper file (stream) from this format, needing buffering in the browser itself.
Download from a sysadmin-ran gateway
- This would add implicit trust in that gateway to fetch it, but;
- Its extra sysadmin overhead
- Running a gateway like that isn't cheap, as it's essentially running an IPFS node
Download through a browser-based IPFS client
- This needs to have the IPFS client ran in service workers, else it doesn't have time to warm up.
- It still needs to warm up, so content isn't availible instantly
- It might consume a bunch of resources on the client, just to find and fetch content
Download through a browser-based IPFS client, peering instantly with an admin-ran IPFS node storing the content
- Same problem as above; Sysadmin overhead and resource usage

All but the last option still needs the data pinned somewhere, either on user's computers, commercial pinning services, or a sysadmin-ran pinning service, so that comes on top of those.

ShadowJonathan commented 1 year ago

One trick that peertube could probably deploy is the following;

IPFS has a high overhead and idle resource mostly of two things: Acting as a DHT server, and reproviding constantly.

The first one adds a lot of inbound connections and related handling, the second one adds a lot of outbound connections and related handling.

Both can be turned off, leaving the client in a relatively-low idle state, but turning of reproviding makes it not possible to discover content properly on the network.

However, Kubo has an option to only reprovide the "roots" of pins, i.e. the head block, the first block, the one that points to all the others.

It's generally not recommended this way, as something like the following could happen;

Alice wants to find X, Bob has X
X points to Y and Z, these aren't announced by bob, but X is.
Alice finds out Bob has X, and connencts, and asks for X, bob provides it.
Alice now sees that it needs to find Y and Z, it starts "looking" for this, possibly taking the time to ask Bob, but;
Alice can possibly have disconnected from Bob before she can ask him if he has Y and Z, this is a problem, because since Alice has X cached, she wont need to look for it anymore, and instead keeps trying to find Y and Z, which nobody tells Alice where it is, because Bob hasn't told anyone he has Y and Z.

This makes this option less favourable, even if it has extremely low overhead (it's easier to announce 100 records than it is to announce 100_000), because of its fallibility. Normally IPFS recovers from this by finding the same node over and over again when DHT-querying the remaining content, and more-or-less stays connected to it while fetching new blocks.

What Peertube could do in this instance is something like this;

It sees that a client wants to play X
It queries through its local IPFS node which nodes have X. (ipfs dht findprovs)
It says Bob has it, Peertube then instructs the local node to add a peering to Bob, i.e. "Stay connected to Bob, and reconnect if things go wrong"
(We could take the fact that Bob has provided this CID as a signal that it has the entire file)
Then, Peertube instructs the local node to download X, and it starts fetching the blocks from Bob, and caches them locally.

However, this still has a few fallacies;

Kubo only lists up to 20 peers, any of these peers might not be a Peertube-ran peer, and could only have cached the root node, not any other.

(Other implementations, such as js-ipfs, give an iterable with all found-peers that have this content, possibly allowing someone to mostly-exhaustively map every node currently providing it.)

There is no real way of knowing if a peer is a peertube-ran ipfs node or not, on the surface.

However, this could be solved by providing a custom protocol through the IPFS node. kubo gives an experimental API for this with ipfs p2p, essentially injecting a new protocol in an existing node.

This could be used to simply say "hi" to another node, verifying if it is a peertube instance, and/or asking "hey, i saw you had X, do you have it all pinned as well?" for reliability, to know that "this is the peer to connect to". Through this protocol it could even be signalled that the peer would like to not be downloaded from, if possible.

Through this it might be possible to have an extremely-low-overhead IPFS peer running inside peertube, one that is mostly dedicated to storing and exchanging IPFS content for peertube primarily.

manalejandro commented 1 year ago

@manalejandro your suggestion seems to be interesting indeed. why not build a plugin and when it's reaching maturity why not to integrate it in peertube core.... It would be interesting to study how archive.org uses IPFS......

Hi, i'm trying to implement the plugin but i don't think it's the best idea to start a js-ipfs instance in the client side or as a server addon, i would like to do an implementation like uses OBJECT_STORAGE with S3 storage, I think it is the most economical solution in this case. I will use the latest release and we will discuss when i have something, regards.

manalejandro commented 1 year ago

I don't have much time to dedicate, but i think this is a good start

let ipfs: IPFS.IPFS
async function getClient () {
  if (ipfs) return ipfs

  const IPFS_STORAGE = CONFIG.IPFS_STORAGE

  ipfs = await IPFS.create({
    repo: IPFS_STORAGE.REPO_PATH || undefined,
    repoAutoMigrate: IPFS_STORAGE.REPO_AUTO_MIGRATE || false,
    peerStoreCacheSize: IPFS_STORAGE.PEER_STORE_CACHE_SIZE || 1024,
    config: {
      Profiles: IPFS_STORAGE.PROFILE || 'server'
    }
  })

  logger.info('Initialized IPFS repo path: %s.', IPFS_STORAGE.REPO_PATH, lTags())

  return ipfs
}

hollownights commented 1 year ago

Hi, i'm trying to implement the plugin but i don't think it's the best idea to start a js-ipfs instance in the client side or as a server addon, i would like to do an implementation like uses OBJECT_STORAGE with S3 storage, I think it is the most economical solution in this case.

@manalejandro Yes, if I got your idea right, that's also how I see it: as a middle ground between totally local storage and totally remote storage (S3). Each instance should have its own local IPFS node (and a public gateway for it) and the files locally stored would then be imported (but not duplicated) to its local IPFS node.

As others have noted, there's more opportunity for practical use as a server-to-server synchronization mechanism than for delivering client video.

@scanlime Yes, that's why I talked about sharing IPNS names of the folders stored by an instance: right now, there is no easy to contribute to an instance by re-hosting its videos. If PeerTube wasn't about to drop WebTorrent, this could be easily done by creating dumps of torrent files and making them available to the users, but WebTorrent support is going to be dropped, so something else must be done so that real decentralization can be achieved.

ghost commented 1 year ago

right now, there is no easy to contribute to an instance by re-hosting its videos. If PeerTube wasn't about to drop WebTorrent, this could be easily done by creating dumps of torrent files and making them available to the users, but WebTorrent support is going to be dropped, so something else must be done so that real decentralization can be achieved.

Of the systems we have implemented or discussed here, few of them actually let a random contributor provide bandwidth assistance to an instance or video they want to support.

We have the redundancy system, which comes the closest. Another peertube server can follow, download content, and provide regular HTTPS URLs that will actually help clients download the video faster. This works pretty well, but it does require that the server allows redundancy, and the features for robustness and abuse prevention are currently quite limited.

WebTorrent was never especially useful for providing bandwidth assistance like this, because so few torrent clients were actually compatible with WebTorrent. The bandwidth helper would have to use one of the few torrent clients that supports WebRTC transport, and each client would still take some time to discover the available helpers rather than getting a list right away like the redundancy system provides.

The IPFS scheme you're presenting, as i understand it, would let anyone provide long-term storage with no reliability claims attached. So, not a replacement for primary storage but an optional source of additional bandwidth. Problem is, that bandwidth is only directly useful to IPFS peers, not to regular web clients. The helpfulness of this scheme depends pretty much entirely on gateway bandwidth. You'd either need more widespread adoption of the WebRTC bridge so js-ipfs can connect to helpers directly, or you'd need people to also (separately?) volunteer to provide IPFS gateway services. This might work but it seems like a lot of extra RAM and bandwidth to spend for roughly the same benefit we had in the existing redundancy setup.

hollownights commented 1 year ago

WebTorrent was never especially useful for providing bandwidth assistance like this, because so few torrent clients were actually compatible with WebTorrent. The bandwidth helper would have to use one of the few torrent clients that supports WebRTC transport

Yes, it's a shame that so few clients offer support for WebTorrent, but there is a WebTorrent client and it can be run on a server, so technically that could still be achieved...

The IPFS scheme you're presenting, as i understand it, would let anyone provide long-term storage with no reliability claims attached. So, not a replacement for primary storage but an optional source of additional bandwidth.

That's right.

Problem is, that bandwidth is only directly useful to IPFS peers, not to regular web clients. The helpfulness of this scheme depends pretty much entirely on gateway bandwidth. You'd either need more widespread adoption of the WebRTC bridge so js-ipfs can connect to helpers directly, or you'd need people to also (separately?) volunteer to provide IPFS gateway services. This might work but it seems like a lot of extra RAM and bandwidth to spend for roughly the same benefit we had in the existing redundancy setup.

I do understand the problems around this scenario, and so I pose this question: how could someone that is familiar with using seedboxes help a PeerTube instance? After all, someone who pays for a seedbox service isn't (necessarily) someone that know how to setup a sever, and as such the existing redundancy setup isn't of real use to such an user. How can this gap be bridged so that your P2P-heavy user can contribute in an easy way?

ghost commented 1 year ago

how could someone that is familiar with using seedboxes help a PeerTube instance?

Seems like we need social infrastructure for this not technical, no? Anyone can install peertube and offer redundancy, if they have a computer that's reachable over https. If their computer is not reachable over https they aren't going to be able to provide data quickly to clients, so it's of limited help.

hollownights commented 1 year ago

Seems like we need social infrastructure for this not technical, no?

That's like saying someone should go for trade school to learn how to install solar panels in order to contribute to the grid.

Anyone can install peertube and offer redundancy, if they have a computer that's reachable over https.

I agree. Is there an .exe for that?

One line in the command line is one line too many.

ghost commented 1 year ago

I agree. Is there an .exe for that?

i get that a lot of folks would like there to be an approximately zero-step way to contribute bandwidth, but the reality is just more complicated than that and we don't have magic solutions...

if you want to help today right now using technology that exists, you need to be reachable over https OR you need to be able to very reliably do NAT hole punching. neither of these are compatible with being a completely clueless user who wants to understand nothing about the network.

as i said, this isn't fundamentally technical it's social. people need to interact socially and trust each other to some extent. if we tried to do video streaming over a fully trustless network that's not going to be a performance improvement over the status quo.

hollownights commented 1 year ago

i get that a lot of folks would like there to be an approximately zero-step way to contribute bandwidth, but the reality is just more complicated than that and we don't have magic solutions...

if you want to help today right now using technology that exists, you need to be reachable over https OR you need to be able to very reliably do NAT hole punching. neither of these are compatible with being a completely clueless user who wants to understand nothing about the network.

I ask for an .exe more to create some discussion than to actually get an .exe. I get it, things just are not that simple. But if today someone who know their way around command line and networks would be to start a seedbox service for PeerTube's instances, would they be able to do it or they would need to host an entire PeerTube instance? Could they run a bunch of "PeerTube's sharing-is-caring instances" much like a seedbox runs a bunch of BitTorrent clients and then let users select which instances they would like to help (much like a user do when they add a bunch of torrents to a seedbox)?

ghost commented 1 year ago

Seems like we need social infrastructure for this not technical, no?

That's like saying someone should go for trade school to learn how to install solar panels in order to contribute to the grid.

Anyone can install peertube and offer redundancy, if they have a computer that's reachable over https.

I agree. Is there an .exe for that?

One line in the command line is one line too many.

i get that a lot of folks would like there to be an approximately zero-step way to contribute bandwidth, but the reality is just more complicated than that and we don't have magic solutions... if you want to help today right now using technology that exists, you need to be reachable over https OR you need to be able to very reliably do NAT hole punching. neither of these are compatible with being a completely clueless user who wants to understand nothing about the network.

I ask for an .exe more to create some discussion than to actually get an .exe. I get it, things just are not that simple. But if today someone who know their way around command line and networks would be to start a seedbox service for PeerTube's instances, would they be able to do it or they would need to host an entire PeerTube instance? Could they run a bunch of "PeerTube's sharing-is-caring instances" much like a seedbox runs a bunch of BitTorrent clients and then let users select which instances they would like to help (much like a user do when they add a bunch of torrents to a seedbox)?

This is basically how the redundancy feature already operates. The server can automatically select popular videos to mirror or an admin can select them. It's done via the peertube UI, so you'd either leave that enabled or write something simpler that's special purpose. But the simplest thing is often whatever you've already got.

alxlg commented 1 year ago

@scanlime I have the impression that you have forgotten what my original proposal was.

ghost commented 1 year ago

@scanlime I have the impression that you have forgotten what my original proposal was.

No, the problem here is that when we try to examine the details of who is providing what over what channels, the magic-ness evaporates and we are stuck talking about servers and clients and recurring administration duties.

As rigelk responded nearly a hundred comments ago, the data needs we have on the client side and the import side are different, so offering uploads over IPFS doesn't help in most use cases. So, that's basically a non starter unless we have the desire and infrastructure to repeat the transcodes in many places.

There are surely places we could use IPFS but I'm just trying to bring this conversation down to earth and work through the actual problems we are trying to solve and understand the limitations of the tech stacks available.

manalejandro commented 1 year ago

Hi, i'm trying to implement the plugin but i don't think it's the best idea to start a js-ipfs instance in the client side or as a server addon, i would like to do an implementation like uses OBJECT_STORAGE with S3 storage, I think it is the most economical solution in this case.

@manalejandro Yes, if I got your idea right, that's also how I see it: as a middle ground between totally local storage and totally remote storage (S3). Each instance should have its own local IPFS node (and a public gateway for it) and the files locally stored would then be imported (but not duplicated) to its local IPFS node.

As others have noted, there's more opportunity for practical use as a server-to-server synchronization mechanism than for delivering client video.

@scanlime Yes, that's why I talked about sharing IPNS names of the folders stored by an instance: right now, there is no easy to contribute to an instance by re-hosting its videos. If PeerTube wasn't about to drop WebTorrent, this could be easily done by creating dumps of torrent files and making them available to the users, but WebTorrent support is going to be dropped, so something else must be done so that real decentralization can be achieved.

Hi, i'm trying to implement the complete backend with IPFS, otherwise it wouldn't make sense to duplicate the data, i think an API like the one used by S3 is the cheapest way to start it, i'm not saying it's the only one or the most efficient one, besides IPFS information can not be duplicated because all have a unique hash, regards.

https://github.com/Chocobozzz/PeerTube/blob/develop/shared/models/videos/video-storage.enum.ts

export const enum VideoStorage {
  FILE_SYSTEM,
  OBJECT_STORAGE,
  IPFS_STORAGE
}

ROBERT-MCDOWELL commented 1 year ago

the best would be that every server starting a peertube instance become a node from a peertube gateway or whatever cluster gateway specified in the config....

Pantyhose-X commented 1 year ago

Storing videos in IPFS prevents administrators from deleting my videos, PeerTube instance administrators from maliciously deleting videos or closing registration/shutting down the server.

Chocobozzz / PeerTube

IPFS to store videos #494