Open eksopl opened 12 years ago
If you could make an API so that it would work on the archives, and also from 4chan to work with linkified dead quotes. Archivers would need to allow CORS from boards.4chan.org of course.
Sure. What kind of output do you want, JSON?
Something like:
/api/<board>/post/<postnum>/ /api/<board>/thread/<postnum>/ /api/<board>/thread/<postnum>/deleted/ /api/<board>/thread/<postnum>/deleted/ghost/ /api/<board>/thread/<postnum>/all/
?
JSON?
That would be lighter network-wise, yeah.
/api/board/post/<postnum>/ /api/board/thread/<postnum>/ /api/board/thread/<postnum>/deleted/ /api/board/thread/<postnum>/deleted/ghost/ /api/board/thread/<postnum>/all/
post
is enough, since I cannot know from which thread a post number comes from.
I personally don't see the point in getting all the deleted posts from a thread. See: accelspam, deleted posts that were not quoted because it wasn't any interesting; deleted posts are usually deleted for good reasons.
Also pinging @woxxy and @oohnoitz so that we can all agree on something universal.
Getting deleted posts from a thread can be hilarious as hell, when you find someone who posted something on accident and then deleted it. But the anti-ghostbump post deletion delay on 4chan made that less likely to happen. There's also that thing when a mod goes on a rampage, too. Far too often I've had the live thread and the archive opened side by side to check for deleted posts.
Of course, I'm used to boards like /jp/ where spam is carried out by posting new threads and not usually done by hijacking existent threads, so other boards might be different.
It would be ultra-taxing on the archivers if enabled on something like 4chan X by default, though. I'm just thinking in terms of a generic API, I don't think that feature should be in 4chan X. Quote hovering for dead posts should probably be fine, but I'd like to hear @woxxy, @oohnoitz and @GXTX.
If I implement an API, I'll probably still support all that kind of functionality, for the sake of being used in smaller scripts with a lower profile than 4chan X, perhaps with config options for the server admin to disable certain requests. I was pretty fond of this one, for example.
Uses data from Fuuka archiver to display tripfriend post-counts
Haha, oh woaw.
We already have an api that goes like
/api/chan/thread/board/
I need to add per-post request in this case. If it's just for on-hover of dead posts it's no issue as in server load. Notice that we'll also support a separate domain like archive-sys.foolz.us/api/ to be able to have boards with reserved names like /admin/ (not that we need it, but who knows if someone wants to make a board called /api/). Make sure you can support separate domain on 4chan X.
Make sure you can support separate domain on 4chan X.
What do you mean?
I believe he means that you should support XHRing from archive-sys.foolz for boards which archive is located at archive.foolz.
See
Archivers would need to allow CORS from boards.4chan.org of course.
Nah, I don't think he's thinking that far ahead, he just means that you'll need some kind of map in the script to keep the html => "archive.foolz.us", json => "archive-sys.foolz.us"
mapping, so you don't request the wrong domain. You know, trivial stuff.
What's wrong with GM_xmlhttpRequest, by the way?
You know, trivial stuff.
Okay.
What's wrong with GM_xmlhttpRequest, by the way?
Not portable. It only works for Scriptish/GM, maybe Scriptify.
Chrome's native userscript handler supports it. Blank Canvas Script Handler and Tampermonkey also support it.
Only Opera doesn't, and Opera doesn't support CORS either.
I mean, sure, it's non-standard, but everywhere where CORS would work, GM_xmlhttpRequest also does. Other than Safari, maybe? Does Safari even support userscripts?
I'll put the CORS header in the API because it's the right thing to do, but as a practical approach, you don't lose anything by using the GM_xmlhttpRequest method.
Opera doesn't support CORS
Opera 12 will support CORS, Opera Mobile 12 is already out as stable and does support it. http://caniuse.com/cors
Does Safari even support userscripts?
see http://blog.neozeed.net/4chan-x-for-safari http://archive.rebeccablacktech.com/4klaani/g/?task=search&search_text=ninjakit I don't know about GM_xmlhttpRequest support though.
I'll put the CORS header in the API because it's the right thing to do
Good.
but as a practical approach, you don't lose anything by using the GM_xmlhttpRequest method.
The method to use GM_xmlhttpRequest is different than normal XMLHttpRequest, and I don't want to maintain different code for different userscript implementations.
Before I forget about it, this is what the JSON'd object should contain, at least for 4chan X's use:
thread id: int [1]
post id: int [1? we don't necessarily need it if we always have the post id to begin with.]
name: string [1]
trip: string [0,1]
user id: string [0,1]
mail: string [0,1]
time: string? [1] (4chan localized time, 4chan time format)
comment: string [1] (directly as the 4chan's HTML would be? It needs to work with spoilers, moot tags, /tg/ dice rolls, /p/'s exif data (if these are archived), etc...)
img: object [0,1]
real filename: string [1]
sorther real filename: string [0,1? maybe construct it on 4chan X's side]
4chan filename: string [1]
thumbnail src: string [1] (archived one)
full src: string or boolean [0,1] (I can construct if from the 4chan filename, but I need to know wether it is archived or not.)
spoilered: boolean [1]
dimension: string [1]
filesize: string [1]
md5: stirng [1]
thumbnail height: int [1]
thumbnail width: int [1]
Hopefully I didn't miss anything.
I guess we'll stick with returning the objects with the database names.
Here's an example from a thread. Ignore the formatted
part, it's internal, so I'll make so that's opt-in. I'll keep the _processed
there since some of the processing might possible only on-server. They might not be useful for you though.
http://archive.foolz.us/api/chan/thread/board/a/num/63272253/format/xml
http://archive.foolz.us/api/chan/thread/board/a/num/63272253/format/json
I'll get you a single-post function soon. It will be /api/chan/post/board/a/num/63272253_12
where the _12
is for ghostposts, and if not added means 0
which means it's not a ghost post.
You will have to deal with BBC tags and parsing backlinks to actual links. Fuuka and FFuuka will surely return the same BBC. https://github.com/eksopl/fuuka/blob/master/Board/Yotsuba.pm#L405-429 or https://github.com/eksopl/asagi/blob/master/src/main/java/net/easymodo/asagi/Yotsuba.java#L122-146 Whichever reads easier.
Here we go, why not just add the function now.
http://archive.foolz.us/api/chan/post/board/a/num/63272253/format/xml http://archive.foolz.us/api/chan/post/board/a/num/63272253/format/json
You will get an error
field in case there's any kind of error and it will contain a human readable explanation. 404 on every kind of error currently since all it's possible is:
Good good. I'll wait for eksopl's opinion on it though, I wouldn't want to maintain different implementations.
Why is there no imageboard standards working groups yet?
Which origins do I have to allow?
boards.4chan.org
.
Dunno if you have to specify the protocols (http and https).
Doen.
this makes we can see deleted posts and ghost posts fetched by woxxy on 4chan? that's so cool!
No it doesn't. To do that we would need a much more powerful server for the archives. This only fetches single posts when you hover on backlinks.
I'm not entirely sure if there's a point of exposing the doc_id
. I can't see any bad implications from that, though.
I'd also prefer returning the time as an int in UTC. Performing a proper EST -> UTC is easy enough to do server-side, but it's not so straightforward to do it on clients (as @woxxy found out when the US changed to DST, wwww), since it involves stuff like tz databases. It is NOT just +5, you will need to use a library of some kind to perform that conversion properly, so it's much easier for the server to do it.
I am okay with the other simple fields as they seem to just be just taken straight out of the database.
thumbnail_href
and image_href
are useful and make sense. Personally, I'd prefer something like thumbnail_link
and image_link
, though. Or to be consistent with fuuka internals, it'd probably be thumb_link
and media_link
.
remote_image_href
and safe_media_hash
also make sense. Again, I'd prefer remote_media_link
, but it's very much a non-issue.
I can't really support _processed
fields on my end, as fuuka does all of its sanitizing on database insert (other than making media_hash URL-safe). I'm okay with defining formatted
as how the post's HTML would be generated in the archive site, so XMLHttpRequest backlinking implementations can use this. Requests will need a theme parameter for foolfuuka, though, won't they?
They're changed to _link
versions on dev of FFuuka. Not sure when it will go live.
@woxxy Mind filling me with updates on this? Has anything changed? I don't feel like waiting anymore for fuuka's implementation.
I totally forgot about this, but sure, I'll follow FF's specification. If something about it ends up hurting my sensibilities too much, we're coordinated enough that we can both change, so.
By the way, if you ever require fast answers from either the FF guys or me and you don't mind using IRC, #fooldriver at irchighway is probably the fastest way.
All right~
@MayhemYDG The only changes done to the API is the renaming of the column/key names. I will list below what some of these column/key holds to avoid any confusion. The rest of the columns/keys should be self-explanatory.
preview_orig
- this is the 4chan filename
media_orig
- this is the 4chan filename
media_filename
- this is the filename of the image uploaded from the user
safe_media_hash
- this is the media hash used in many of our links to avoid the need of url encode
@oohnoitz
preview_orig
- this is the 4chan filenamemedia_orig
- this is the 4chan filename
So I assume preview_orig
is the 4chan filename displayed on board pages and media_orig
is the actual 4chan filename?
@eksopl May I remind you about issue #23?
preview_orig is the original thumbnail filename on 4chan, media_orig the image filename. preview_orig = 1678456348s.jpg media_orig = 1678456348.png media_filename = smugsion.png
It's called media rather than image because in theory 4chan can host stuff like PDFs.
Also, that one is still pending on writing a proper DB+image migration script for the scheme FF uses now (no duplicated images). Next three weeks I'm going to be basically gone, so ETA end of this month.
If you need to load the thumbnail with the quote preview, you should use the value for thumb_link
in the API. Furthermore, media_link
will only return a value if it exists on the server.
How do you guys handle >quotes from the raw text into spans? How the dumper converts html into text and stores it doesn't help me on that one.
@MayhemYDG https://github.com/FoOlRulez/FoOlFuuka/blob/master/application/models/post_model.php#L889-891
Basically:
$find = "'(\r?\n|^)(>.*?)(?=$|\r?\n)'i";
$html = '\\1<span class="greentext">\\2</span>\\3';
$comment = preg_replace($find, $html, $comment);
That doesn't look like it would work with hurr[spoiler][/spoiler]>durr
.
Do you do it before or after reHTMLing?
I also need a clarification: title
is for textContent
and title_processed
is for innerHTML
, right?
Our regex makes the greentext work only if the line starts with a >
. But then, I just tried on 4chan. Is that sequence a trick to greentext? If that's the case, I guess we should have it as well at least for the archive boards.
title_processed
is title
, but with invisible characters removed, RTL characters removed and HTML entities converted (htmlentities
) to prevent display tricks and HTML injection.
Both of them (all of the string variables from users) are passed under iconv
with //IGNORE
flag to strip characters that aren't UTF-8.
Is that sequence a trick to greentext?
Yes, you newfag.
;_;
The only possible "capcode"
values are "N"
, "A"
, and "M", right?
There really needs to be a wiki page in one of you guys' repository for explanations. Especially when you change things around like subject
-> title
.
N
is a normal user, A
is Admin, M
is mod, G
is global mod.
About documentation, I'll write some more later. I've just started covering developers' part.
G
is global mod
explain further
G is God. It's a joke. It puts little flowers all around your name in Fuuka, it's pretty much unused.
It's an original Fuuka specification, but with your reaction I guess you never seen it happening either. From the table creation SQL:
capcode enum('N', 'M', 'A', 'G') NOT NULL DEFAULT 'N',
@oohnoitz did not include G
in the search system so my guess is that we don't have a G
in the database. I am running a query right now to make sure we don't have something from the past decade, but it will take a while since I am not using SphinxSearch.
You can completely ignore G
. If the query returns an entry, I'll notice you.
@eksopl ...what. Ah well. Shutting down the query.
And getting rid of the Global Mod lines in the theme too I guess.
If there's any G entries, they'll be ghost posts only. You can safely treat G as N if FoolFuuka/4chan X. It's more of an easter egg than anything else.
It's pretty much usable now, feel free to try it. I haven't done the mobile user info stuff yet. The date text when not formatted by 4chan X is not formatted like 4chan yet. I'm not sure how I'll handle spoilered images.
As for handling comment tags and greentext, I went with, in order:
comment
value, to sanitize.<br>
s and tags to html tags.bq.innerHTML = bq.innerHTML.replace(/(^|>)(>[^<$]+)(<|$)/g, '$1<span class=quote>$2</span>$3')
That should match 4chan's HTML correctly.
Nice. I moved the original issue this was about over to GH-54 the and renamed this one.
I might have bug tracking OCD.
JSON just kicked in, yo. Etc.