KichangKim / DeepDanbooru

AI based multi-label girl image classification system, implemented by using TensorFlow.
MIT License
2.58k stars 258 forks source link

Addressing issues regarding this system #5

Open DonaldTsang opened 4 years ago

DonaldTsang commented 4 years ago

It would be easy to break such a system and cause mis-tagging by using these libraries as a demonstration on the weakness of using NN for automated image tagging.

Here are some of my proposals to make it more resilient:

  1. image augmentation to prevent overfitting
  2. usage of multiple models for the same task https://arxiv.org/abs/1809.00065 and maybe add Inception or others to the system
  3. de-noising the image using something like

AND THEN THERE IS THIS (claiming that most mitigation strategy fails) https://github.com/anishathalye/obfuscated-gradients

I have talked about something similar in https://github.com/halcy/DeepDanbooruActivationMaps/issues/3

DonaldTsang commented 4 years ago

Some useful information regarding the semantic segmentation of images https://github.com/mrgloom/awesome-semantic-segmentation Weird problems that will arise from using the repos within verbatim:

  1. How do we deal with tag synonyms and tag subsets? Do we create a system of which segmented regions can have multiple tags?
  2. What about character tags vs facial/clothing component tags? How do we correlate them together into a logical manner? hierarchies?
  3. What about segmented regions that are too small? Would it get picked up by DD 1.0 but not DD+SS system?
  4. How many layers do we need maximum? 32? (since that is the maximum amount of tags per image in general?) 64/128/256?
DonaldTsang commented 4 years ago

Some ideas in how to implement a Semantic Segmentation dataset/model "ShoujoSegment"

  1. The initial dataset phase
    • Gather a list of images with strong heatmap confidence
    • Use Recaptha's 3x3 5F-3T-1U test to refine the borders (remember to augment and noise them)
    • Collect results from volunteers and address weighting and credibility issues
  2. The Semantic Segmentation training phase
    • Create the system model (or better yet multiple models)
    • Use the collected data to train the system
    • Optimize the system speed and accuracy wise regarding ensembles
  3. The data refinement phase
    • Increase the scope of images used
    • Use Recaptha's 3x3 5F-3T-1U test to refine the borders (remember to augment and noise them)
    • Use volunteer's results to refine the Semantic Segmentation
  4. Others things that can be done outside of this loop
    • Create micro-models (that is a simplified version of the main model) for mobile systems
    • Apply this system into a new social media network for community contributions
    • Use the "ShoujoSegment" system to refine DeepDanbooru and vice versa

This concept would be applied as the "Humans in the Loop"or "Active Learning" system. A good example would be:

If there are crowdsourced Semantic Segmentation this can help http://ilpubs.stanford.edu:8090/1161/1/main.pdf and http://ceur-ws.org/Vol-2173/paper10.pdf

DonaldTsang commented 3 years ago

I am just going to put this here, for those who wants to go from label to table.