E-Hentai DB

Just another E-Hentai metadata database

Requirements

Node.js 8+
MySQL 5.3+ / MariaDB 10+

Setup & Start Up

If you just want to see the data from gdata.json, use 0.1.x, and if you want to keep your gallery up-to-date, use 0.2.x. The master branch and 0.3.x and latter includes more features like torrent hashes, and it may takes a long time to sync

git clone the repo
Run npm i --production in the repo directory to install dependencies
- If you want to build Web UI, use npm i directly, then run npm run build, the static Web UI files will be in /dist directory
Download gdata.json from E-Hentai Forums and place it into the repo directory
Import struct.sql into a MySQL / MariaDB database
Edit config.js, set database username, password, database name, etc.
Run npm run import [file=gdata.json] to import the JSON file into your database
- If you want to update to latest galleries, run npm run sync [host=e-hentai.org] [timestampOffset=0]
- If you want to resync gallery metadatas since a few hours ago, run npm run resync [hour=24]
- If you want to mark all replaced galleries, run npm run mark-replaced (new galleries will mark them automatically)
- If you want to get torrents from all galleries, run npm run torrent-import [host=e-hentai.org] (USE AT YOUR OWN RISK)
- If you want to update torrents from torrent list, run npm run torrent-sync [host=e-hentai.org]
- If you want to manually fetch some galleries, run npm run fetch {gid}/{token} {gid}/{token} ... or npm run fetch [filename]
Wait a few minutes, as it has about 800,000 records (on my PC it takes 260s, and on my server it's 850s)
Run npm start, the server should be run on 8880 port by default config

Available APIs

All the params can be pass as a part of URL, or put it in search query. Like /api/gallery/:gid/:token, you can call it like /api/gallery/123456/abcdef1234 or /api/gallery?gid=123456&token=abcdef1234.

The response type of all APIs are JSON, and follow the format below.

{
    "code": 200,          // 200 = success
    "data": {...},        // response data
    "message": "success", // error message
    "total": 100          // result counts (if `data` is a list)
}

data should normally be a metadata, or a list of metadata, or null if any error happens. The format of metadata is based on E-Hentai's offical gallery JSON API, you can check it on EHWiki. But data type may be a little different from offical API, like using int for posted and filecount instead of string.

{
    "gid": 592178,
    "token": "41cc263dc7",
    "archiver_key": "434486--1617c38d90630b5e399e730d62dea241363cdce6",
    "title": "(Shota Scratch 5) [Studio Zealot (Various)] Bokutachi! Shotappuru!! (Boku no Pico)",
    "title_jpn": "(ショタスクラッチ5) [Studio Zealot (よろず)] ぼくたち!しょたっぷる!! (ぼくのぴこ)",
    "category": "Doujinshi",
    "thumb": "https://ehgt.org/4c/6a/4c6ad39fffcdefcb2cd35218a95395af2e5ad74d-1854978-2118-3000-jpg_l.jpg",
    "uploader": "tooecchi",
    "posted": 1368418878,
    "filecount": 63,
    "filesize": 75630519,
    "expunged": 0,
    "removed": 0,
    "replaced": 0,
    "rating": "4.54",
    "torrentcount": 1, // useless, count it by `torrents` instead
    "root_gid": 592178,
    "tags": [
        "male:crossdressing",
        "male:shotacon",
        "male:tomgirl",
        "male:yaoi",
        "artist:tower",
        "artist:mokkouyou bond",
        "male:anal",
        "male:schoolgirl uniform",
        "male:catboy",
        "artist:murasaki nyaa",
        "artist:po-ju",
        "artist:rustle",
        "artist:miyakawa hajime",
        "artist:fujinomiya yuu",
        "artist:tanuma yuuichirou",
        "male:school swimsuit",
        "artist:mikami hokuto",
        "artist:azuma kyouto",
        "male:josou seme",
        "parody:boku no pico",
        "male:frottage",
        "male:bloomers",
        "artist:nemunemu",
        "group:studio zealot",
        "artist:aoi madoka"
    ],
    "torrents": [
        {
            "id": 632947,
            "name": "(Shota Scratch 5) [Studio Zealot (Various)] Bokutachi! Shotappuru!! (Boku no Pico)",
            "hash": "2a4641feba9943b0e028927879ff6567e74bf0ae",
            "addedstr": "2019-02-28 00:39",
            "fsizestr": "72.13 MB",
            "uploader": "Hyenacub"
        }
    ]
}

`/api/gallery/:gid/:token`

Alias: /api/g/:gid/:token

Get gallery metadata.

Query params:

gid: Gallery ID (required)
token: Gallery token (required)

Returns: metadata

`/api/list`

Get a list of galleries.

Query params:

page: Page number (default: 1)
limit: Gallery number per page (default: 10, <= 25)

Returns: metadata[]

`/api/category/:category`

Alias: /api/cat/:category?page={page=1}&limit={limit=10}

Get a list of galleries which matches one of specific categories, category can be a list split with ,, then it will returns the matched galleries.

category can be a list of string or a number (use xor, and if you want to exclude some category, use negative number, like if you want to get a list of Non-H galleries, the category can be one of Non-H, 256 or -767)

Misc                1           (1 << 0)
Doujinshi           2           (1 << 1)
Manga               4           (1 << 2)
Artist CG           8           (1 << 3)
Game CG             16          (1 << 4)
Image Set           32          (1 << 5)
Cosplay             64          (1 << 6)
Asian Porn          128         (1 << 7)
Non-H               256         (1 << 8)
Western             512         (1 << 9)

Query params:

category: Gallery category (required)
page: Page number (default: 1)
limit: Gallery number per page (default: 10, <= 25)

Returns: metadata[]

`/api/tag/:tag`

Get a list of galleries which matches ALL of specific tags, tag can be a list split with ,, then it will returns the matched galleries.

The tag should include the category type of tag, like if you want to search some full-colored Chinese translated furry galleries with male fox, you can try /api/tag/language:chinese,male:furry,male:fox,full%20color.

Query params:

tag: Tags (required)
page: Page number (default: 1)
limit: Gallery number per page (default: 10, <= 25)

Returns: metadata[]

`/api/uploader/:uploader`

Get a list of galleries which uploaded by soneone.

Query params:

uploader: Uploader (required)
page: Page number (default: 1)
limit: Gallery number per page (default: 10, <= 25)

Returns: metadata[]

`/api/search`

Get a list of galleries which matches all the query requests.

The rule of keyword supports most operators of E-Hentai:

Search for gallery title and Japanese title
Exact terms (" ") with spaces
- Underscore (_) is not supported (use Quotation " " instead)
Wildcard (*/%) at the end of the pattern (though the query will add % by default)
Exclude (-) specific terms
Or (~), matching any one of them [v0.3.1]
Colon namespaces (:) for tags
- Supports a subset of qualifiers tags: tag:, uploader:, gid: [v0.3.1]
- Terms without : will be treated as title keyword (probably like title:?)
Exact match for tags ($)
- Tags without $ can be used for prefix match [v0.3.1]
Shorten tag namespaces (character: -> char: / c:) [v0.3.1]

For usage examples, see EHWiki.

Before v0.3.1:

- If you want to search an uploader, use `uploader:{uploader}` - If you want to search a tag, use `{tagType}:{tagName}$`, and if `tagName` contains space, quote it and `$`, like `{tagType}:"{tagName}$"` - If you want to search a word, just put it, and if it contains space, quote it like `"{keyword}"`

You can use multiple keywords, split them with space %20, relations between all the keywords are AND (except uploder uses OR), so in theory more keywords will get more accure results

Query params:

keyword: Search keywords, split them with space %20
category: Gallery category, same as /api/category
expunged: Show expunged gallery (default: 0)
removed: Show removed gallery (default: 0)
replaced: Show replaced gallery (default: 0)
minpage: Show gallery with page count larger than this (default: 0)
maxpage: Show gallery with page count smaller than this (default: 0)
minrating: Show gallery with minimal stars (includes minus half stars) (default: 0, <= 5)
page: Page number (default: 1)
limit: Gallery number per page (default: 10, <= 25)

Returns: metadata[]

Notes

It eats my memory when importing

The import script will load the WHOLE JSON file (as I prefer to insert the older galleries, so I didn't import them by reading the file in chunk). So when importing, it may eat 1 GB ram or even more, make sure you've setup a swap file on your server

dd if=/dev/zero of=swapfile bs=1M count=2048
chmod 0600 swapfile
mkswap swapfile
swapon swapfile

I got duplicate records when re-importing

~~Do not cancel when importing, as the import script doesn't support resume import, so you'll have to truncate all table or delete them and create a new one~~

Now the import script supports resume importing, you can cancel your imports and run npm run import at any time, it'll start from your last record

The query speed is still too slow when querying multiple tags

Try adding indexes if you want

ALTER TABLE `gid_tid` ADD UNIQUE(`gid`, `tid`);
ALTER TABLE `gid_tid` ADD INDEX(`tid`);
ALTER TABLE `tag` ADD UNIQUE(`name`);
ALTER TABLE `gallery` ADD INDEX(`category`);
ALTER TABLE `gallery` ADD INDEX(`uploader`);

If you want to add all of these indexes, the database size will increased from 330 MB to about 500 MB

No primary key in table `gid_tid`

I'm not sure should I add an id column, as I'm not using it to query. But if you want, try the following SQL, and it'll takes about 110 MB

ALTER TABLE `gid_tid` ADD `id` INT NOT NULL AUTO_INCREMENT PRIMARY KEY FIRST;

Why MyISAM?

I've little knowledge with database, you can change struct.sql to use InnoDB or others you want

The server quits when I exit the terminal

Try npm start &, or use PM2 or forever to keep it running in background

Web UI is not included in git repository

They may in GitHub release page, but if it's not here, you can build it by yourself, just run as simple as npm i then npm run build, and set webui to true in config.js.

Why React, React Router, Moment.js ... are in `devDependencies`?

I prefer it's a Node.js project, and Web UI is just an optional function, also you can grab distributed Web UI files without building it. Whether you need Web UI or not, the front-end libraries are not touched when you setting up the server, as they've been packaged into distributed files.

Todos (or not to do)

[X] Advanced search (tags, category, uploader, keyword in one search)
[X] Web UI
[X] Torrent hashes
[X] Update to latest galleries

Thanks

Sachia Lanlus, as he collects almost all the gallery metadatas before Ex downs and share the gdata.json
Tlaster / ehdb, the table structures are based on his SQLite database, as I've almost forgot how to handle the tag list with gallery
StackOverflow/11694761#21408164, the answer helps me to handle multiple tags searching, the searching time of 3 tags is from 60s down to 1.7s on my PC
Tenboro, the god who creates the world
The community helps E-Hentai to overcome (YAY it's alive!)

License

GPLv3

ccloli / e-hentai-db

readme

E-Hentai DB

Requirements

Setup & Start Up

Available APIs

`/api/gallery/:gid/:token`

`/api/list`

`/api/category/:category`

`/api/tag/:tag`

`/api/uploader/:uploader`

`/api/search`

Notes

It eats my memory when importing

I got duplicate records when re-importing

The query speed is still too slow when querying multiple tags

No primary key in table `gid_tid`

Why MyISAM?

The server quits when I exit the terminal

Web UI is not included in git repository

Why React, React Router, Moment.js ... are in `devDependencies`?

Todos (or not to do)

Thanks

License

ccloli / e-hentai-db

readme

E-Hentai DB

Requirements

Setup & Start Up

Available APIs

/api/gallery/:gid/:token

/api/list

/api/category/:category

/api/tag/:tag

/api/uploader/:uploader

/api/search

Notes

It eats my memory when importing

I got duplicate records when re-importing

The query speed is still too slow when querying multiple tags

No primary key in table gid_tid

Why MyISAM?

The server quits when I exit the terminal

Web UI is not included in git repository

Why React, React Router, Moment.js ... are in devDependencies?

Todos (or not to do)

Thanks

License

`/api/gallery/:gid/:token`

`/api/list`

`/api/category/:category`

`/api/tag/:tag`

`/api/uploader/:uploader`

`/api/search`

No primary key in table `gid_tid`

Why React, React Router, Moment.js ... are in `devDependencies`?