fpgaminer / joytag

The JoyTag Image Tagging Model
Apache License 2.0
400 stars 25 forks source link

[Discussion] Comparison with Danbooru interrogator in SD Automatic1111 #1

Closed NanoCode012 closed 8 months ago

NanoCode012 commented 8 months ago

Hello, thank you for sharing this model.

I did a quick and naive check between this and the Danbooru Interrogator in Automatic1111's webui and compared with the actual tags. The test took the 100 most recent posts from Danbooru with a success rate of 88/88. (12 images didn't have proper url to download).

These are my current observations:

I'm looking forward to see if this can be integrated into the webui or retrained for even more tags!

Note:

Settings: Threshold 0.5

chart-100im-0 5thresh-v2


I just noticed that the threshold on the doc was 0.4. Here, I ran the same code but with the new threshold. The images pulled may not be the same. Success rate 88/88 (12 failed to download).

Observation:

chart-100im-0 4thresh-v2

Edit: Updated charts to reflect fixed Interrogator code. Cleaning tags was necessary.

O-J1 commented 8 months ago

Just on the the topic of accuracy and such. We've run into danbooru tagging accuracy issues. Whilst it might be good, its not great. Also, given the ratio of anime to realistic that this model is using more realistic tagging in DB style will likely be needed, either way this is a positive step. 👍

fpgaminer commented 8 months ago

I did a quick and naive check between this and the Danbooru Interrogator in Automatic1111's webui and compared with the actual tags

Wow, thank you! Independent validation is incredibly helpful.

Is Danbooru Interrogator still using the old DeepDanbooru model? There are much, much better ones now like SmilingWolf's work, which probably performs better on anime images than this model currently anyway. I'd be happy for this model to pass that watermark some day, but the main focus of the JoyTag model has been on expanding into real life images for now.

NanoCode012 commented 8 months ago

Is Danbooru Interrogator still using the old DeepDanbooru model?

I think so!

There are much, much better ones now like SmilingWolf's work, which probably performs better on anime images than this model currently anyway.

I noticed that repo hasn't updated in a while, so I didn't realize it was better. Although, the one better seem to be an ensemble?

I'd be happy for this model to pass that watermark some day, but the main focus of the JoyTag model has been on expanding into real life images for now.

I understand that completely. This was just a curious experiment to see how the model performed :)