Can't get the right results

BIGJUN777 commented 2 years ago

Hi there,

When using this model, I took safe images as inputs but got opposite results. The code I constructed is almost the same as yours.

class SafetyClassifier:
    def __init__(self, model_cache_dir=None):
        self.model = self.load_safety_model(cache_folder=model_cache_dir)

    def load_safety_model(self, cache_folder=None):
        if cache_folder is None:
            home = expanduser("~")
            cache_folder = home + "/.cache/clip_retrieval"
        model_dir = cache_folder + "/clip_autokeras_binary_nsfw"
        if not os.path.exists(model_dir):
            os.makedirs(cache_folder, exist_ok=True)
            path_to_zip_file = cache_folder + "/clip_autokeras_binary_nsfw.zip"
            url_model = (
                "https://raw.githubusercontent.com/LAION-AI/CLIP-based-NSFW-Detector/main/clip_autokeras_binary_nsfw.zip"
            )
            urlretrieve(url_model, path_to_zip_file)
            with zipfile.ZipFile(path_to_zip_file, "r") as zip_ref:
                zip_ref.extractall(cache_folder)

        loaded_model = load_model(model_dir, custom_objects=ak.CUSTOM_OBJECTS)
        # print(loaded_model.predict(np.random.rand(10**3, 768).astype("float32"), batch_size=10**3))
        return loaded_model

    def __call__(self, clip_embs):
        if isinstance(clip_embs, torch.Tensor):
            clip_embs = clip_embs.cpu().numpy()
        return self.model.predict_on_batch(clip_embs)

I encountered several warnings when inferring. My running environment: autokeras==1.0.19 and tensorflow==2.9.1. Did I miss something?

BIGJUN777 commented 2 years ago

shall we need to normalize the CLIP embedding before inputting it to the model?

BIGJUN777 commented 2 years ago

I downloaded several images from laion2B-en-aesthetic and tried to use the CLIP model (ViT-L/14) to get the clip embedding and take it as input to the NSFW detector. However, the results were different from those shown on laion2B-en-aesthetic.

rom1504 commented 2 years ago

Yes you need to normalize the embeddings in input

On Thu, Aug 25, 2022, 09:48 BIGJUN777 @.***> wrote:

I downloaded several images from laion2B-en-aesthetic https://huggingface.co/datasets/laion/laion2B-en-aesthetic and tried to use the CLIP model (ViT-L/14) to get the clip embedding and take it as input to the NSFW detector. However, the results were different from those shown on laion2B-en-aesthetic https://huggingface.co/datasets/laion/laion2B-en-aesthetic.

— Reply to this email directly, view it on GitHub https://github.com/LAION-AI/CLIP-based-NSFW-Detector/issues/4#issuecomment-1226901990, or unsubscribe https://github.com/notifications/unsubscribe-auth/AAR437UBH2TZIBVBUJKO5TLV24QMLANCNFSM57RNSXEA . You are receiving this because you are subscribed to this thread.Message ID: @.***>

BIGJUN777 commented 2 years ago

Yes, I tried to leverage the normalization from improved-aesthetic-predictor.

def normalized(self, a, axis=-1, order=2):
    l2 = np.atleast_1d(np.linalg.norm(a, order, axis))
    l2[l2 == 0] = 1
    return a / np.expand_dims(l2, axis)

def __call__(self, clip_embs):
    if isinstance(clip_embs, torch.Tensor):
        clip_embs = self.normalized(clip_embs.cpu().numpy())
    return self.model.predict_on_batch(clip_embs)

But the results still could not match your results shown on laion2B-en-aesthetic. I checked the img_embs in your provided dataset, and their data type is float16. I tried to use fp16 inference to get the float16 embeddings, but the results were still not correct. Did I miss something? Thanks.

with autocast(enabled=fp16_model):
    clip_embs = extractor.encode_image(data['image'].cuda())

I constructed the CLIP model as following:

class ClipExtractor(nn.Module):
    def __init__(self, model_name="ViT-L/14", jit=False):
        super().__init__()
        self.model, self.transform = clip.load(model_name, device='cpu', jit=self.jit)

    @torch.no_grad()
    def encode_image(self, images):
        with torch.no_grad():
            images_embeddings = self.model.encode_image(self.transform(images))
        return images_embeddings

BIGJUN777 commented 2 years ago

@rom1504 @christophschuhmann Any ideas on my problems? Thanks.

LAION-AI / CLIP-based-NSFW-Detector

Can't get the right results #4