FoolCode / FoolFuuka

FoolFuuka is a high performance imageboard software that is fully customizable. It contains a powerful administration system, extendable plugin engine, and etc. (FoolCode Package)
http://foolfuuka.rtfd.org/
213 stars 38 forks source link

Hiroyuki proposes 4chan Search Engine: Use FoolFuuka's Sphinx? #146

Open antonizoon opened 9 years ago

antonizoon commented 9 years ago

Recently, Hiroyuki stated that he's thinking about a search engine on 4chan, in his Q&A thread. How about we recommend that he use FoolFuuka's Sphinx Search Engine (free, open source, all data stays on server)?

FoolFuuka's Implementation of the Sphinx Open Source Search Server has a interface that is familiar to 4chan users, has been battle tested on many archiver sites, and is proven to be powerful for sifting through piles of 4chan threads. Most of all, all data stays on the site.

It is 2015. Times have changed. There is no reason to have a third-party contractor (like Hottolink) implement a website's search engine when you can do it yourself for free.

Plan B: Hyperlink to Archives

Propose to let each board have a search hyperlink. This simply links to the corresponding Fuuka archiver as is, nothing changes. Dead thread URLs could also redirect as well.

Advertise this as making search and archive view possible "without any need to strain or overhaul 4chan, and with access to threads 2 years back or more".

Our actual implicit goal is that no actual search handling will done by 4chan.

If he accepts this (I think he is pretty darn eager to take any suggestion without thinking too deeply, he's usually wasted), then we are fine. If not, he gets a chance to testify with some kind of reason, and the 4chan userbase acts from there.

2channel + Hottolink

Hiroyuki recently stated the following in his Q&A, in response to allegations from 2ch users about data mining:

Oct. 2012: Entered into an exclusive commercial licensing agreement with Tokyo Plus Co., Ltd. and Mirai Kensaku Brazil, LLC, the operators of the 2channel site, for information posted on the 2channel site In Japanese copyright law at that time, you can't upload contents without permission, even it is search engines. hotlink Inc., provided custom search engines for clients for marketing purpose, So, the company need 2ch permission to make the search engines. And hot link is a public company in Japanese stock market. If they are lying, you can get tons of money by suing them. Go ahead. Get rich. :) Like this one, https://gnip.com/sources/twitter/ What hotlink., Inc. wants is publicly available text messages. They don't want any personal information.

  • Hiroyuki

Notice that he states that he licensed them (not sold) publicly viewable text metadata, which is the same data a search engine like Google crawls. This way, users could search the large text archives. This is the same data anyone can get in a web scrape or (in the case of 2ch) on the Internet Archive.

If I were an evil data mining corporation, I sure as hell don't need to ask anyone to scrape 4chan or use the Fuuka Archives.

woxxy commented 9 years ago

It's no trivial task to implement a performant search engine on top of a pre-existing site, especially one coded like 4chan. I am unsure whether the 4chan servers can take the load of a search engine.

He has two options:

To be honest, you guys should just tell him to redirect 404s to archives and forward search requests to archives as well.

antonizoon commented 9 years ago

Better cooperation with the 4chan Archivers sounds like a much better idea actually: I think Hiroyuki will be more amenable to it than Moot was, as 2channel was fully archived.

I don't know when will be the next chance to contact him or the 4chan administration for an ordinary user like me, but sometime soon would be the best window of opportunity. Alternatively, maybe the archivers can come together and ask him directly?

fgts commented 9 years ago

I don't think he would be interested in using public (archive) data. It's very clear that he needs data only available to 4chan, such as IP addresses, in order to sell it to Hottolink.

Think about it. If he wanted public data, he wouldn't need to buy 4chan.

antonizoon commented 9 years ago

Truthfully, the real point of this exercise is to get someone trustworthy, without a stake in whatever Hiroyuki is thinking of (such as you guys, the archivers), to get into Hiroyuki/4chan's administration by suggesting to implement a search system for them. With the way Hiroyuki seems to accept any suggestion, I don't think this will be hard if the Archivers band up.

Then, you, the unbiased third-party authority, could get a better look (than any of us can get), into what is really going on under the covers. If something is actually going on, plan ahead to make a warrant canary (just like Truecrypt did) to show that you have reasonable cause to believe that something strange is going on.

I myself doubt that Hiroyuki functions on anything other than impulse. However, his thoughtlessness could cause other unrelated weird stuff to happen. So this is a way to, well, observe what is going on.

ghost commented 9 years ago

He's not going to comply with our proposal, and even if he did he could still gain data from users and rely on the servers provided by archives. In fact, he could even be doing this right now without our assistance. (For example, he creates a search form on 4chan.org, which just makes an API call out to the FoolFuuka search endpoint, and then if the thread is still open present it to the user. In that instance he still gets private user data (IPs) and does not have to rely on supplying his own infrastructure.)

antonizoon commented 9 years ago

Are you absolutely sure that he would reject it? There is no way to know without trying.

Remember, other people have spoken for this guy for his entire career. The allegations came up in a time when 2channel was rocked by an extremely destructive war of information, the Anti-Matome Blog Movement. No side in this struggle had clean hands, akin to GamerGate. Both participated in historical revisionism, thread doctoring, and yellow journalism on 2channel: especially the Matome Blogs. It's hard to say what is real there.

He deserves to at least speak for himself for once. Then we interpret based on his public response.

The Plan

Propose to let each board have a search hyperlink. This simply links to the corresponding Fuuka archiver as is, nothing changes. Dead thread URLs could also redirect as well.

Advertise this as making search and archive view possible "without any need to strain or overhaul 4chan, and with access to threads 2 years back or more". Our actual implicit goal is that no actual search handling is done by 4chan (don't say that).

If he accepts this (I think he is pretty darn eager to take any suggestion without thinking too deeply), then we are fine. If not, he gets a chance to testify with some kind of reason, and the 4chan userbase acts from there.

Motivation

Before the Q&A, some anons on /qa/ made a thread that prepared him some questions in a thread about his legacy on 2channel.

The questions were engineered to nudge him to inadvertently spill the beans. It worked.

He responded to all them with something very interesting that gave a new perspective on the issue. I'm currently cross referencing the validity of his testimony with the 2channel diaspora on /newsokur/. A major revelation is that Hiroyuki is suing Jim first, which completely contradicted a prevailing idea on 2ch. Some still needs to be researched.

So yes, it is worth a shot.

bui commented 9 years ago

Hiroyuki likely has no intention at all to use a third-party search solution. As nice as the thought is, he's here to make money, and the only way he'll do that is by using his company to build a proprietary search engine.

http://razil.jp/ourservice.html

antonizoon commented 9 years ago

Another motivation to conduct this suggestion: is to nudge him to inadvertently tell us his actual plans, so we have early warning if the rumors are true.

It looks like Hiroyuki is Governing under the Influence. He seems very willing to blurt out the answer given the right question, especially as he is constantly inebriated.

He might just drunkenly accept, and then all is well with the world. If he backtracks later, you know what that means. It's a strategy that works Hook, Line, and Sinker.

Remember, if he does publicly reject the idea and tell us why, that's not an act in vain. Isn't that statement also a valuable piece of information?