ccloli / e-hentai-db

Just another E-Hentai metadata database
https://イー変態.ロリ.みんな
GNU General Public License v3.0
89 stars 13 forks source link

What's "root_gid" for? #1

Closed affinityDrops closed 4 years ago

affinityDrops commented 4 years ago

Is it used to indicate the gallery has a parent gallery of id root_gid?

ccloli commented 4 years ago

Yes, it points to the root level of its parent galleries. It can be used to determine duplicated old galleries, and to query its torrents (uploaded torrents are saved under root gallery).

affinityDrops commented 4 years ago

(uploaded torrents are saved under root gallery).

do only root galleries have uploaded torrents, not the descendants?

ccloli commented 4 years ago

Maybe, as on torrent search page, the gallery of each torrents is the root gallery, also on gallery torrent page, the tracker id is the root gallery.

image

affinityDrops commented 4 years ago

Thank you for your detailed explanation!

I have one more question besides. Why is torrent-import marked as USE AT YOUR OWN RISK? Does this put a lot of load on the eh server?

ccloli commented 4 years ago

Yes, this script will request ALL the gallery torrent pages of galleries that mark root_gid as NULL. The script will do the following things:

  1. Get all the galleries which are not set the root_gid field
  2. Request the torrent page one by one (will use proxy if specified)
  3. Get the id from that tracker announce URL, and save it to the root_gid field
  4. If the torrent page has one or more torrents, save them to torrent table with gid field as root_gid

Since when you import the gdata.json, you've imported about 800K galleries, so this script will request at least 800K times to get these informations.

I made a release here, which includes a copy of my server's database dump. Also someone requested me to give him a latest dump about 2 weeks ago, so I also made a patch here. If you want to run the server and start with a latest data set, you can just import these SQLs without importing gdata.json.

affinityDrops commented 4 years ago

I see that the only way to identify root_gid is to request one by one since the official API does not tell you about the parent.

I think that it would be good to have a flag for "root_gid existence is not determined". Once you know whether root_gid really exists or not, you don't have to request again since the existence of root_gid remains constant. Next thing to do is to create a long-running task version of torrent-sync to avoid too much frequent request. I might make a PR for it, but it is not that urgent.

Speaking about the database dump, that someone is me. Thanks anyway 😁