Bionus / imgbrd-grabber

Very customizable imageboard/booru downloader with powerful filenaming features.
https://www.bionus.org/imgbrd-grabber/
Apache License 2.0
2.55k stars 216 forks source link

gelbooru - every image's artist is "unknown", and randomly stopped color-coding unique tags #2235

Closed Johndoespunkmire closed 3 years ago

Johndoespunkmire commented 3 years ago

gelbooru randomly stopped color-coding/sorting artist and character tags

Hello, I've been using grabber to download images from gelbooru for quite sometime now. I'm an ocd organizer, and the color-coded tags save so much time when ripping images. I loaded up grabber today, and every image I click on is now labeled as an "unknown artist". When looking at the tags of a given image, I can see the artist and character tags mixed in, alphabetically, with the rest of the "black text" tags. In the viewer settings, I checked to see if artist/character tags became "un-color-coded", but that wasn't the case - artist and character tags still have their default red and green colors in my settings.

Under Options - Interface - Image Window: I have my "Tag List Position" set to "Top", and my "Tag Order" set to "Type"

I always name my downloads with the artist's name first. When I now download images, they're all titled as "unknown".

I checked other sites as well, but this seems to only not be working with Gelbooru, because other booru sites still have color-coded tags, and are labeling their images with the associated artists.

I also restarted my computer, deleted and reinstalled both x64 and x86 versions of Grabber, with no luck.

I truly hope someone can help. I exclusively rip images from Gelbooru, and I have a meticulous sorting workflow that's helped me organize 100s of images already. This really throws a wrench into that. If anyone can help, thank you.

yami-no-tusbas commented 3 years ago

Grabber version, OS version, logs off the application ? I tested right now on nightly, everything seems fine Yeah, i just had a tags.txt from september 2020 that was doing the job (Tag loader is so usefull).

Yeah the bug is there, no color if I remove the tags.txt, the page : "https://gelbooru.com/index.php?page=dapi&s=post&q=index&limit=20&pid=0&tags=" don't give tag type, that my be part of the problem, but since I don't know how grabber get the tag-type in the first place...

I can suggest that you use my tags.txt as a workaround for the time being, put it in your "C:\Users\\AppData\Local\Bionus\Grabber\sites\Gelbooru (0.2)\gelbooru.com" folder, then launch grabber and color should be restored and you can use the renaming tool to correct the file that where wrong. tags.txt

This file was made on the 16/09/20, maybe some recently addend artist or tags won't be there. But the tool used to make the file (Tag loader) as no gelbooru option anymore (on nightly at least)

And I strongly recommend that Grabber ask for user to make a local tag database at least on the first launch, and maybe an automatic update system just in case. (Like every six months or so)

Wrellll commented 3 years ago

I want to add that I had the same issue, however in addition to adding the tags.txt to the "Gelbooru.com" folder I also had to copy the tag-types.txt file from "Danbooru2.0" in order to restore tag color and proper naming functionality. The tag-types.txt file is placed in the same place as tags.txt and contains the following text: 0,general 1,artist 3,copyright 4,character 5,meta with each number+type combination on their own line

Johndoespunkmire commented 3 years ago

Hi, thank you so much for trying to help. I've placed your text file in "sites/Gelbooru (0.2)/gelbooru.com", and I opened grabber again, but it seems to still not be working. Are there any other steps I need to do, or should dropping it in the folder be enough?

I also tried going to the "tag loader" in Grabber. I saw that Grabber recognized your tags.txt file, because it said that Gelbooru had (100,594) tags. Earlier, it had said "0 tags".

After that, I thought maybe I could load the Gelbooru tags again. So I hit "start" to re-load Gelbooru tags. It gave me an error, saying "0 tags were loaded". And the tags.txt file that you gave me had been wiped blank. All tags were removed.

However, I think I may know what caused my tags to disappear originally now:

After seeing that when I hit "start" on loading Gelbooru tags in the Tag Loader, all of the Gelbooru tags that were in "\sites\Gelbooru (0.2)\gelbooru.com" get wiped. I think I might have done this the other day, and wiped my list of tags from Gelbooru. It also gave me the same error "0 tags loaded". After that, I think my tags stopped working.

yami-no-tusbas commented 3 years ago

Sorry to hear that, maybe it's because I'm on nightly version, my tags.txt can be a little "off", strangely for me everything is fine I'll check my tag-types too.

Tag type are not the same, here is what is in my tag-types.txt : 0,general 1,deprecated 2,metadata 3,copyright 4,character 5,artist

I know that I'm using API with my login because without you're pretty limited, I use nightly grabber, grabber use my old tags.txt and everything seems to work fine. But if I go html version of gelbooru nothing work, only API for the moment.

And yeah I just go an image where a "new" character tag appears black where other are fine, so API doesn't send tag information it seems.

I hope bionus will find the solution, that could be dramatic in the long run.

Update I installed grabber "normal" and after some testing, you need my tag-types.txt + tags.txt to have coloring working (and the naming scheme too), it's only a workaround but it works, but tag newer than 16/09/20 won't be colored.

Johndoespunkmire commented 3 years ago

So update: I didn't see Wrellll's post at the time I posted. I went back and tried his suggestion, and it seemed to actually work combined with your tags.txt file.

However, Grabber was listing random tags as the author, random tags as characters, and other weird inconsistencies. Basically, I got colored tags back, but for the wrong tags.

So, I finally tried updating my "tag-types.txt" to what was in your version... and boom! Everything's back to normal!

I think what happened, was I accidently tried updating Gelbooru's Tag List, and Grabber erased all my data instead. Told me "0 Tags were updated", and erased everything. I won't try that again, but do you know why it would do this?

Anyway, thank you both so much. I've had really bad experiences with "bug reports" and "troubleshooting forums" in the past. I truly didn't expect to get my copy of Grabber working again, but figured I'd try this one last shot in the dark. Thank you both, yami-no-tusbas and Wrellll!

yami-no-tusbas commented 3 years ago

What I assume is that gelbooru changed something in their API and HTML, so grabber can't find tags or ask them, I did go to the pages grabber use for grabbing the tags, and their apparences have changed, looks like html+css has been reworked. Maybe the Grabber parser can't find the tags anymore, thinks It have all of them (as "0") and write a tags.txt file empty. And the API (xml) has maybe been modified in a way that grabber can't get tags type from it when asking a page. (if it ever was).

So the only thing to do is wait for Bionus to update the gelbooru model, because there is new tags every day, and at one point my solution won't work either.

yami-no-tusbas commented 3 years ago

Just tested this commit (nightly 1fe84600 ), and sorry to report that but, it don't seems to work, tag are still uncolored when having no tags.txt in gelbooru 0.2 folder. And since the change it's pretty slow :

[20:07:25.044][Info] [gelbooru.com][Xml] Receiving page `https://gelbooru.com/index.php?page=dapi&s=post&q=index&limit=20&pid=0&tags=`
[20:07:51.024][Info] [gelbooru.com][Xml] Parsed page `https://gelbooru.com/index.php?page=dapi&s=post&q=index&limit=20&pid=0&tags=`: 20 images (20), 0 tags (325), 5465605 total (5465605), 273281 pages (273281)

and on a test tag :

[20:09:19.574][Info] [gelbooru.com][Xml] Loading page https://gelbooru.com/index.php?page=dapi&s=post&q=index&limit=20&pid=0&tags=danganronpa
[20:09:20.180][Info] [gelbooru.com][Xml] Receiving page https://gelbooru.com/index.php?page=dapi&s=post&q=index&limit=20&pid=0&tags=danganronpa
[20:09:55.442][Info] [gelbooru.com][Xml] Parsed page https://gelbooru.com/index.php?page=dapi&s=post&q=index&limit=20&pid=0&tags=danganronpa: 20 images (20), 0 tags (299), 15931 total (15931), 797 pages (797)

I verified my model.js, the fix is in it. But maybe it's too ealry to report it.

Bionus commented 3 years ago

Not sure what's up with that @yami-no-tusbas but this change should have absolutely 0 impact on speed, especially for the "listing" part since it wasn't even changed 🤔

Note that on Gelbooru, you won't have colored tags when putting the mouse on top of a thumbnail (since their API doesn't directly provide this information), but only when opening or batch downloading an image. Is that how you tried it?

When opening an image, there might be a slight delay before the tags get colored, as there is a request done to Gelbooru first.

Testing on a fresh nightly with your search (danganronpa): image image

yami-no-tusbas commented 3 years ago

I will try in some minutes, but yeah something is slowing done the app, could be another issue (I know you fix shouldn't do that, but it was just after putting the new version). The slow don't is when the page get parsed. I'll assume my nightly is borken in some way and do a clean install. And yeah, on a clean install it work as described, and even the tag loader is there again, but don't work for gelbooru. And tag pannel is out of order on gelbooru too. Don't what setting was slowing me down like that, but french install is snappy !

AbyssResearcher commented 3 years ago

@yami-no-tusbas @Johndoespunkmire The reason why you are getting inconsistencies is because you are probably using a default tags.db/tags.txt against a default tag-types.txt. Unless your tag.db/tags.txt has the tag info, the tag will not be classified. Furthermore, some tags in the default tags.db are not the same as what Gelbooru uses, e.g artist names.

The actual schema for Gelbooru is: 0 tag 1 artist 3 copyright 4 character 5 metadata 6 deprecated

Example of Type 1/Artist: https://gelbooru.com/index.php?page=dapi&s=tag&q=index&id=1082253 https://gelbooru.com/index.php?page=dapi&s=tag&q=index&id=1082253&json=1

Example of Type 6/Depreciated: https://gelbooru.com/index.php?page=dapi&s=tag&q=index&id=76339 https://gelbooru.com/index.php?page=dapi&s=tag&q=index&id=76339&json=1

I am currently trying to update the tags.db to the latest. Will take 3-5 days. That is unless someone else can provide the very latest tag collection.

Bionus commented 3 years ago

I am currently trying to update the tags.db to the latest. Will take 3-5 days. That is unless someone else can provide the very latest tag collection.

You can find pre-generated ones here: https://github.com/Bionus/imgbrd-grabber/releases/tag-databases

AbyssResearcher commented 3 years ago

I have examined it but as it only has 35732 entries, there are many tags missing. For example, can you do a search for yukichi_(tsuknak1) and see if the image is saved with the artist's name in Grabber?

The schema used for https://github.com/Bionus/imgbrd-grabber/releases/tag-databases seems to be: 0,general 1,meta 2,copyright 3,character 4,deprecated 5,artist

yami-no-tusbas commented 3 years ago

yukichi_(tsuknak1) is not in the tags.db

~~I only found 0 | yukichi(eikichi) | 5 0 | yukichi(sukiyaki39) | 5

here is a new version that I cannot upload here... as a db... so I added .txt at the end. gelbooru_210126.db.txt

Tags are from this morning, converted to csv then sqlite.~~

EDIT: Yeah I did the same by removing all under 100 too ! So it'll no be usefull in this case.

Bionus commented 3 years ago

As stated in the description, only tags with more than 100 images are included. yukichi_(tsuknak1) only has 20.

AbyssResearcher commented 3 years ago

@yami-no-tusbas Yes I know. Thanks for the help though.

@Timurama Grabber's tag generator gets data via Danbooru's api. Gelbooru also tries to follow Danbooru's tag system. So it works out somewhat. But there may be some inconsistencies between the 2 boorus info. One thing which might help with updating of tags is that Gelbooru seems to create a new tag instead of updating an old one (from what I suspect, not sure if it is the case). (So we don't need to start from 0 when updating tags) I am curating Gelbooru's tag db of 1000000+ tags. It will be done in a few days. As for a more concrete solution, Bionus is the man as he is the dev for Grabber.

yami-no-tusbas commented 3 years ago

@AbyssResearcher If you want it fast, just use tag loader, then convert the loaded tags to SQLite from csv. And gelbooru don't have that much tag only about 100.000 tags in fact. here it is, done yesterday with tag loader and converted : tags.db.txt (2.38Mo) Just remove the .txt part. (there is about 20missing tags, that got error because they used "," as part of the tag, some weird random japaneses games)

By the way here is updated tag-types tag-types.txt Don't think it will be usefull since I didn't see a diference since I updated it, I think it is usefull for the tag loader only, but correct me if I'm wrong.

Bionus commented 3 years ago

it's very limiting that there's no way for the prog to tag these anymore, this gonna be fixed eventually, I hope?

It's already been fixed in Nightly for more than a week AFAIK (daf3070e15997a184c77fadc5e1a3a764cc05918).

AbyssResearcher commented 3 years ago

Oh it worked. Thank you @Bionus I edited the wrong line the first time.

@Timurama You can go to /Grabber/sites/Gelbooru (0.2)/model.js Replace the html:details function with: details: { url: function (id, md5) { return "/index.php?page=post&s=view&id=" + id; }, parse: function (src) { return { tags: Grabber.regexToTags('<li class="tag-type-(?<type>[^"]+)">(?:[^<]*(?:<span[^>]*>[^<]*)?<a[^>]*>[^<]*</a>(?:[^<]*</span>)?)*[^<]*<a[^>]*>(?<name>[^<]*)</a>[^<]*<span[^>]*>(?<count>\\d+)</span>[^<]*</li>', src), imageUrl: Grabber.regexToConst("url", '<img[^>]+src="([^"]+)"[^>]+onclick="Note\\.toggle\\(\\);"[^>]*/>', src), }; }, }, The git commit show the change at line 132, but that is the typescript file. I guess in the javascript file the change would be at line 117 after the build. I have collected the tags, but since the solution is already fixed, I'll probably upload the tags in the weekend if anyone still wants that.

Bionus commented 3 years ago

You can also just download the nightly release here: https://github.com/Bionus/imgbrd-grabber/releases/nightly