Chocobozzz / PeerTube

ActivityPub-federated video streaming platform using P2P directly in your web browser
https://joinpeertube.org/
GNU Affero General Public License v3.0
12.77k stars 1.46k forks source link

Search on global fediverse #824

Closed ballsystemlord closed 3 years ago

ballsystemlord commented 6 years ago

I can find no search bar and your faq.md and online faq do not list a method of searching peertube leaving me with only the option of manually going to each site and searching it. Is there something I am not seeing or is there a tool, like a browser plugin or cmdline tool like surfraw to search all of peertube? Thanks!

Chocobozzz commented 6 years ago

No you can't for now. I a few ideas to do such thing:

rigelk commented 5 years ago

After some chit-chat about that, we discussed of a few possibilities:

ghost commented 5 years ago

One downside of, as an instance administrator, following everything is that there's a lot less ability to moderate or curate or reliably categorize content. This can create problems for search quality, but also for community safety or legality.

Systems like the automated flood-fill discovery above would place a somewhat arbitrary moderation burden on every administrator, if I understand the description correctly.

On the other hand, a centralized server centralizes that moderation/curation/abuse-response role, for better or for worse.

Personally, as an administrator, I would not want this responsibility. I would disable a promiscuous federation feature like this, since I wouldn't want to be both the admin of my instance's content and the view of the entire fediverse that my instance provides. Other admins may be more open to take on that role, but then how will those instances deal with problems of illegal or otherwise undesirable content that threatens the instance or its users?

A central search server or multiple central search servers would avoid putting this extra responsibility on every instance administrator, and that may end up being a more useful service for users who are new to peertube anyway.

Booteille commented 5 years ago

Isn't possible to have a endpoint listing all the available instances with each instance categorized by theme? (themes could be "nature", "general", "politics", etc) Then the instance owner choose in the list which instances he wants to follow by theme and/or one by one, as he prefers.

ghost commented 5 years ago

@Booteille, who would decide what the categories mean and what content to include or exclude? this is still a moderation job that has to be done; perhaps tags of some kind could help the instance admins and search admins share this work, but the work still exists.

r4dh4l commented 5 years ago

Hi all and thx @ballsystemlord for opening this issue!

I'm quite new to the concept of PeerTube but very interested in this concept because as home server administrator I want to support decentral IT service concepts in general.

Promoting PeerTube among my friends the fact that there is currently no "global search" was unfortunately somehow the "walk-away-point" for using PeerTube. Looking for the "global PeerTube search" I found this issue which is extremely interesting for me related to the pro/contra list. Actually I don't know which solution would be the best to preserve a strict decentral concept but what I can say for now: The explanation video What is PeerTube? (english subtitles) is not enough to explain the users how to use PeerTube, especially related to video search. People not used to decentral concepts simply don't understand the problem of a central search and refuse service with a different usability concept (even though the different concept is for their own good).

Anyway for now I would suggest: There should be an explanation text under every search box of a PeerTube node (or a symbol linking to it) that

  1. explains that there is no global search (with a link to another text eplaining why, maybe this issue)
  2. lists an overview of all other PeerTube nodes the current one is connected to so to indicate the range of the current search

Edit: Maybe the "there is no global search" explanation should be placed above any search results as well with a text like:

The decentral concept of PeerTube has no global search which means the results listed here don't reflect the whole PeerTube content of the web. The results listed here are content of the following PeerTube nodes this node is connected to:

- peertube.one
- peertube.three
- peertube.four
- peertube.z

[Other PeerTube nodes](https://joinpeertube.org/en/#getting-started) are not listed because the settings by the PeerTube administrator of this node (due to personal or legal reasons).

I understand that a sustainable solution of this issue needs a lot of time but until then PeerTube as software needs to pick up the people where they currently are (and the are in expectation of a global search). Explaining users why there is no global search would be the best solution that doesn't make anything wrong, just better. I'm very said to say but In the current state of the PeerTube usability the mass of the people won't accept it as long they are not educated better in decentral concepts (which unfortunately has to be done by PeerTube - What is PeerTube? (english subtitles) is just a (great) start for the needed "elucidation").

ghost commented 5 years ago

We could use Yacy to solve this issue. Yacy is a FOSS and decentralized search engine so we can host instances of Yacy ourselves and define an operator for peertube videos. For example, if someone in Yacy will search for:

video:peertube cats

then it should output only peertube links about "cats". Now we can integrate this into peertube's website and pass search results from Yacy into peertube's UI seamlessly, without having the user to type any special commands, only the keywords he needs results for.

Note: It's worth mentioning that for this to work properly, most likely we'll need to crawl peertube websites ourselves (it's very simple to do), just as shown in this video.

elevenpassin commented 5 years ago

@Zig-03 If it's possible to plug into Yacy, we can run Yacy right along side Peertube. Yacy will index each and every node, instances, channels & users as members of an instance explore the fediverse. When searching, we can plug into Yacy for search results (We don't have to do everything on our own! We can stand on the shoulders of other free software).

@scanlime As far as I can tell, An instance owner shouldn't have to moderate content being hosted on other instances. Our instance will not host the content permanently (Unless you manually specify to seed them) so I don't think we have to worry about any content related issues.

rigelk commented 5 years ago

@buoyantair @Zig-03 while doing something with Yacy outside of the PeerTube codebase is certainly fine, I don't see us requiring Yacy on instances just to bring them more global search results. It is yet another external tool and it doesn't simplify the deployment of PeerTube at all.

elevenpassin commented 5 years ago

@rigelk Why don't we explicitly ask the instance owner at installation time? + We can give them a cli tool to install new plugins (say Global search in this case).

This means that we will have the current instance-follow-specific search by default and anyone else looking for global search can just install and enable them? This would not only mean that we don't have to integrate it of sorts into our code base (We just send the Yacy server search strings and it gives us back search results to display on our main Peertube client) but rather just interface around it?

ghost commented 5 years ago

Do we really want instances to be searching among servers they aren’t normally following?

For instances that want “everything”, they’ll be following as many servers as possible anyway. Many instances though don’t actually want all the content, they’re trying to be more focused.

Do you as an administrator want the ability to include search results for videos that aren’t otherwise available?

Do you as a user want to force all servers to include global search results?

I’m not sure what the intended result is here, and it might be worth making sure that the technical capability you’re envisioning will be useful and enabled by admins.

elevenpassin commented 5 years ago

I agree with @scanlime It's either be introvert or an extrovert. Instances which are extroverted will attempt to make a connection with every new instance they discover. Instances which are introverted will limit their connections to a close knit set of instances.

ballsystemlord commented 5 years ago

You guys are ignoring one REALLY big thing with respect to global search engines like Yacy. Evil instances of peertube.

Let me elaborate. If I'm running Google (Heaven forbid!), I can tell my search engine to not go to certain websites, to profile what websites users visit, and to preform a fuzzy search of the web database I have and rule out websites containing certain keywords that should not be used together (Like "C event oriented multi-threaded programming made easy", with "easy" being the operative keyword. :) ). This allows me to train my search engine, Google, to be good at avoiding sending users to sites that are malicious and/or click bait. In the case of peertube, it is decentralized like the web. Unlike the web, we don't want to follow users (AFAIK), have a hard time finding enough information on the peertube instances' video descriptions (and entering an accurate description takes time), and we don't even verify if the instance is still hosting videos, or has turned into a javascript powered bitcoin mining with your browser operation: https://thehackernews.com/2017/11/cryptocurrency-mining-javascript.html Here's an instance that is telling my browser to run JS something from ajax.cloudflare.com: https://luttube.tk/ . Normal instances don't require this: https://devtube.dev-wiki.de/ . Yes, I did search Yacy's FAQ, there is no mention of how they intend to solve this. Nor does freenet for that matter, but they have moderators and hand built indexes (EDIT: They don't allow javascript in their webpages either).

We can't expect to accomplish this like Google does. We can't have a global search without a set of moderators (Requiring no JS and telling people's browsers to disable it when viewing peertube sites would set the bar for evil sites much higher though).

I recommend the following (Sorry I don't know much JS so I can't much help): 1: We tell instances to include a user defined classification. We could use something like the original usenet system: https://en.wikipedia.org/wiki/Usenet This could be expanded to be more like the common tag systems of blogs. 2: A language tag: https://en.wikipedia.org/wiki/IETF_language_tag 3.a: A user entered rating, strictly for telling user agents not to show this result to kids (or me when I'm trying to trouble shoot a strange computer error). 3.b: I recommend: Safe, Iffy, No, like the various current search engines have. 4.a: We create a Javascript powered client that sent requests asynchronously. All it would have to do is parse and aggregate the results from many sites into one or more local pages. 4.b: It would use a set of check boxes for setting preferred languages (which would be relayed to the individual search engines). 4.c: It would have a radio button for setting the rating (which would be relayed to the individual search engines). 4.d: It would have a set of check boxes for setting which classifications should be searched under. 4.e: It would have a black list of instances which would be created and destroyed by setting cookies in the browser and would instruct the user agent to not search X or Y site. 4.f: Just in case people want to only search a set amount of sites we could add that too. 4.g: It must sort the results according to the search terms. 4.h: It must include a timeout. 4.i: It must include a way to limit the amount of simultaneous connections. 5: Each instance of peertube would have a page that the user could load the JS program from. If users did not trust the instance they could go a git repo and download the webpage with JS program or something else.

You already maintain a list of peertube instances, this could be decentralized so that each instance has a list and users would not have to request this information from only the main site. Individual people could create and allow users to download and "install" lists of sites that are "evil" or "click bait". It would then only be a matter of having individual instances increase the power of their search engines (case sensitive, don't include results with X word, etc.).

Advantages: It would work on all computers that could watch videos, including phones. Censorship would be harder then a centralized system. It would be much more powerful than the current system. It would hold up much better to an abusive instances than a centralized system. We would not have to administer/censor/block/check or link tax anything: https://www.wired.co.uk/article/what-is-article-13-article-11-european-directive-on-copyright-explained-meme-ban If users complain about lots of bad results we can tell them to block that peertube instance and make peertube vids as to how. No trust model.

Drawbacks: It would be slower than a centralized system, especially on a slow (cell/modem), network. It would use up some B/W, but it's only extra text (no or limited preloading of images/vids), so it should not be too much (this might be further helped by using UDP as per HTTP 3.0: https://www.zdnet.com/article/http-over-quic-to-be-renamed-http3/ ).

silicium14 commented 5 years ago

Hello, I created a prototype of centralized search engine hosted at https://peertube-index.net. The source code is at https://github.com/silicium14/peertube_index.

Aluriak commented 5 years ago

@silicium14 that is a very good idea for a first step.

I bet the final solution will be something like that, with the decentralization given by Yacy, and fair&open recommendation algorithms along the way. Decoupling hosting and research seems to me an obvious improvement.

EvgenijM86 commented 4 years ago

Hello, I created a prototype of centralized search engine hosted at https://peertube-index.net. The source code is at https://github.com/silicium14/peertube_index.

Thanks. It is better than nothing, but we can already see the problem as people who host that search engine are already decided to censor search results. Probably not because they wanted to, but to avoid being the sole person responsible for whatever is shared. Maybe something like that should be hosted on a TOR network to be truly uncensored.

ballsystemlord commented 4 years ago

Hello, I created a prototype of centralized search engine hosted at https://peertube-index.net. The source code is at https://github.com/silicium14/peertube_index.

Thanks. It is better than nothing, but we can already see the problem as people who host that search engine are already decided to censor search results. Probably not because they wanted to, but to avoid being the sole person responsible for whatever is shared. Maybe something like that should be hosted on a TOR network to be truly uncensored.

In the US (a "free country") Tor has the noted drawback that most places that offer internet access for free, block access to the sites from which you get tor and tails. Many block connections to the network and any other proxies that they're aware of. Some even go so far as to block access to the websites where you can download linux distros which might have tor installed. I speak with over 4 years experience hopping from one internet cafe to another. And that's just the US. Peertube search over Tor is a fine idea, but it's leaving a lot of ground uncovered. Especially computers where you can't just install the tor browser or anything else you feel like.

magus777 commented 4 years ago

Hello, I created a prototype of centralized search engine hosted at https://peertube-index.net. The source code is at https://github.com/silicium14/peertube_index.

This is great. And definitely needed. I'd like to suggest some ideas:

peetss commented 4 years ago

I want to take on the work to evolve this into a YouTube-style interface where videos across all instances can be viewed. I'm glad to see there is recent discussion on this topic. Censorship on YouTube continues to grow, and at an accelerated rate. The time for this is now.

thomask-gh commented 4 years ago

Hi, there's an idea that I don't see having already been discussed and that I think could be relevant for this global search feature: you might get some useful inspiration from the way distributed search engines such as Yacy (for instance) work. As a disclaimer, I don't know much about them nor about their inner workings, but I know they exist and it seems to me that they might be a relevant model for PeerTube. What do you think? 🙂

(sorry if what I'm bringing up is already covered in previous discussions, I honestly didn't take the time to read the detail of all the options mentioned)

ballsystemlord commented 4 years ago

On Wed, 27 May 2020 08:15:39 -0700 Thomas Kuntz notifications@github.com wrote:

Hi, there's an idea that I don't see having already been discussed and that I think could be relevant for this global search feature: you might get some useful inspiration from the way distributed search engines such as Yacy (for instance) work. As a disclaimer, I don't know much about them nor about their inner workings, but I know they exist and it seems to me that they might be a relevant model for PeerTube. What do you think? 🙂

IIRC, I did think of using Yacy as a base or whole search engine for peertube. I decided against because, as I said earlier, a search engine, even distributed, can be attacked by govs that favor censorship.

thomask-gh commented 4 years ago

I decided against because, as I said earlier, a search engine, even distributed, can be attacked by govs that favor censorship.

Well, I don't really get your point. What I understand is that you're saying that you don't think we should use Yacy because it's a search engine and that any search engine, even distributed, is vulnerable to censorship and should thus be avoided. But if you follow this logic, that would mean we have no search engine at all, which means no search feature. Maybe when I say "search engine" you think of external services like Bing or Google, but any piece of software that looks for specific content in a larger pool of content is a search engine. That includes the search feature in a blog, in Twitter, on Mastodon or on your local file system for instance. So building the "search on the global fediverse" feature would definitely amount to building a search engine into PeerTube. And, on the Internet, a distributed search engine is as close as you get to being censorship-resistant. :)

So let me clarify: I don't suggest to use Yacy itself, nor any other third-party already-existing service. I suggest to build a mechanism similar to the one Yacy (or other distributed search engines) uses into PeerTube to power a fediverse-wide search feature. That is, a mechanism in which each instance indexes a part of the content on the fediverse (making up a "local index" on each instance) and in which, when a search is performed, requests are sent to other peer instances, searches are performed on those instances' indexes, and results are combined by the instance or user who made the search request (or maybe by a centralized "raking server"?).

That's just the base idea, and in fact it's somehow similar to the third option mentioned in this comment

onlyjob commented 4 years ago

IIRC, I did think of using Yacy as a base or whole search engine for peertube. I decided against because, as I said earlier, a search engine, even distributed, can be attacked by govs that favor censorship.

Wrong. Everything can be potentially abused/attacked but there got to be something to abuse first. Prioritise local search; make search on fediverse optional/configurable; let node admin to manage white/black lists of nodes to search, etc. A lot could be done to keep multi-node search a useful and valuable feature. Discoverability of information is crucial therefore fediverse search must be implemented. Not necessarily using YaCy but by any other means it should be possible to search on fediverse.

ballsystemlord commented 4 years ago

I decided against because, as I said earlier, a search engine, even distributed, can be attacked by govs that favor censorship.

Well, I don't really get your point. What I understand is that you're saying that you don't think we should use Yacy because it's a search engine and that any search engine, even distributed, is vulnerable to censorship and should thus be avoided.

Sorry, my bad. I should have re-read my comment above. The problem with Yacy type instances is that they lend themselves to being deceived. Google can censor websites that abuse search terms, for example I search for "butter" and I get directed to a dating site. Or you could get directed to a site that hosts peertube videos, but has been hacked to make your browser do bitcoin mining, or spectre/meltdown attacks (sitting on the same site for a long time would be ideal for such an attacker), or such in the background. Yacy can't solve that AFAIK.

onlyjob commented 4 years ago

This is not just a problem of malicious actors. The challenge and art of searching online is to cherry-pick valuable information and separate it from the noise. Any search engine have a lot of noise.
But without a search engine how and where do you even begin to discover information that you are after? The DHT-based search on aMule/kademlia network is amazing, despite all the noise, because it is up to users to filter through the noise since they are the only ones who know what are they searching for. I think we can all agree that even bad search is better than nothing.

ghost commented 4 years ago

The challenge and art of searching online is to cherry-pick valuable information and separate it from the noise. Any search engine have a lot of noise.

Most search engines have some sort of filters (I guess YaCy has them too) and we could use them to remove all the noise. For example, in google you can paste this site:github.com "activitypub" and you'll get clean results only from the github website that include the word activitypub.

Ok, that's nice, YaCy has them too! https://wiki.yacy.net/index.php/En:SearchParameters

If we need some custom search parameters that would fit our use case - we could submit a PR on yaCy's github page!

onlyjob commented 4 years ago

IMHO YaCy is great for indexing web pagaes and RSS feeds but federated universe should have its own built-in search based on DHT, similar to aMule's Kademlia implementation.

1000i100 commented 4 years ago

For french who want to speak about that : https://framacolibri.org/t/recherche-globale-federee/8155

Chocobozzz commented 3 years ago

Implemented in https://github.com/Chocobozzz/PeerTube/pull/2852

r4dh4l commented 2 years ago

Sorry for missusing this Issue but I don't know where else to ask: I was used to use https://peertube-index.net/ for federated search requests but since some the website seems offline. Are there any alternatives?

Booteille commented 2 years ago

Hi. Take a look at https://sepiasearch.org/

r4dh4l commented 2 years ago

Hi. Take a look at https://sepiasearch.org/

Thank you very much!