Alvi-alvarez / sd-Img2img-batch-interrogator

Img2img batch interrogator for AUTOMATIC1111's Stable Diffusion web UI
MIT License
18 stars 3 forks source link

Adding Custom Interrogator #9

Closed Pawelekkkkk closed 1 month ago

Pawelekkkkk commented 2 months ago

I'd like to use custom interrogator. Could be from CLIP Interrogator tool like ViT-bigG-14/laion2b_s39b_b160k. Where to put models so batch interrogator will find them or how to do it any other way. Also, would be great to have same possibility to choose between: best, fast classic and negative mode. Thanks for great tool.

SmirkingKitsune commented 2 months ago

That's an interesting proposition. Wondering on the possibility of operating the clip_interrogator_ext.py script from clip-interrogator-ext , from the sd_tag_batch.py script.

It seems clip_interrogator_ext.py has an API we can use to collect the models and run the prompt generation. Since an API will be used --api will need to be added to startup command. The script currently uses the native interrogators from A1111, so API handling has not been implemented (yet). I am going to try to make a rough outline of how we might do this, but this is a first draft and may contain errors and oversights.

THE FOLLOWING IS UNTESTED I will be adding personal notes about thoughts on code snippets.

Add CLIP API to the model_options like this:

        model_options = ["CLIP", "Deepbooru", "CLIP API"]

Should investigate hiding "CLIP API" if the CLIP API is not available to remove clutter... At the very least add a reporting mechanism to indicate that CLIP API was not found...

Need to add CLIP API options to the UI, will probably look like this:

        # CLIP API Options
        with gr.Accordion("CLIP API Options:"):
            clip_api_model = gr.Dropdown(get_clip_interrogator_models(), value='ViT-L-14/openai', label="CLIP API Model")
            clip_api_mode = gr.Radio(choices=["fast", "best", "classic", "negative"], label="CLIP API Mode", value="fast")

I am unsure about the dropdown, needs further testing. Maybe we could do multiselect to run multiple CLIP API models...

Change the UI return to accomidate the CLIP API options:

        return [in_front, prompt_weight, model_selection, use_weight, no_duplicates, use_negatives, use_custom_filter, custom_filter, clip_api_model, clip_api_mode]

Change the run definition:

    def run(self, p, in_front, prompt_weight, model_selection, use_weight, no_duplicates, use_negatives, use_custom_filter, custom_filter, clip_api_model, clip_api_mode):

Add the CLIP API to the model interrogation loop:

           elif model == "CLIP API":
                interrogator += self.get_clip_api_prompt(p.init_images[0], clip_api_model, clip_api_mode)

Write some definitions for fastAPI interactions: First would be the CLIP API model collector:

    def get_clip_interrogator_models(self, image):
        # Ensure CLIP Interrogator is present and accessible
        try:
            response = requests.get("http://127.0.0.1:7860/interrogator/models")
            response.raise_for_status()
            models = response.json()
            if not models:
                raise Exception("No CLIP Interrogator models found.")
        except Exception as error:
            print(f"Error accessing CLIP Interrogator API: {error}")
            return ""
        return models

Unsure if this would work, might need to convert models, need to check on that.

Next would be the CLIP API prompt handler:

    def get_clip_api_prompt(self, image, model_name, mode):
        # Ensure the model name and mode are provided
        if not model_name:
            print("CLIP API model name is required.")
            return ""
        if mode not in ["fast", "best", "classic", "negative"]:
            print("Invalid CLIP API mode.")
            return ""

        # Encode the image to base64
        buffered = BytesIO()
        image.save(buffered, format="PNG")
        img_str = base64.b64encode(buffered.getvalue()).decode("utf-8")

        # Get the prompt from the CLIP API
        try:
            payload = {
                "image": img_str,
                "mode": mode,
                "clip_model_name": model_name
            }
            response = requests.post("http://127.0.0.1:7860/interrogator/prompt", json=payload)
            response.raise_for_status()
            result = response.json()
            return result.get("prompt", "")
        except Exception as error:
            print(f"Error generating prompt from CLIP API: {error}")
            return ""

Unsure if this definition would work, need to further investigate CLIP API, and troubleshoot. Should not be using localhost 7860, instead port should be collected from A1111, need to investigate how to properly do that... Need to investigate how to unload model, as it could be detrimental to VRAM if left loaded. At the very least, need to leave note to user indicating that model can be unloaded from the CLIP API tab...

This may require substantial testing, however if successful, we might be able to duplicate this for other tagging extensions like stable-diffusion-webui-wd14-tagger

SmirkingKitsune commented 2 months ago

I ended up not using the API method I mentioned above. It is available for testing at pull request https://github.com/Alvi-alvarez/sd-Img2img-batch-interrogator/pull/10. It will allow users to use interrogators from clip-interrogator-ext and stable-diffusion-webui-wd14-tagger. It also allows users to set their CLIP mode and WD tagging thresholds. I think it should satisfy this issue when it gets pulled. Note: new interrogator(s) will only appear if the specified interrogator extension(s) are installed and enabled.

Pawelekkkkk commented 2 months ago

From all testing the best is commit 1a423bf. That is more that I dreamed of and it's fully working. I will do more testing. It's very helpful with automation of everything as different interrogator models are better for different tasks, I have problem with newest commits, as I don't see CLIP EXT options to choose anymore, but I have very little experience so maybe I a doing something wrong.

SmirkingKitsune commented 2 months ago

Thank you! I really appreciate that you like the features.

Your clip-interrogator-ext problem is strange. Both of the commits (https://github.com/Alvi-alvarez/sd-Img2img-batch-interrogator/pull/10/commits/5bbf505e9fe5d381594f55eb28995dd2cce19a19 and https://github.com/Alvi-alvarez/sd-Img2img-batch-interrogator/pull/10/commits/6395a823bec0d303452dbb8692a0c6f2c2f45719) are the same commit (Github being Github, I guess). I don't recall updating the interaction code with clip-interrogator-ext since https://github.com/Alvi-alvarez/sd-Img2img-batch-interrogator/pull/10/commits/b3743e43b22c6d45abdf5b1bff46861a0f2266df upstream from https://github.com/Alvi-alvarez/sd-Img2img-batch-interrogator/commit/1a423bf1dc06c04633c47d270c64f7aabc0e07b9, I wonder where the source of the problem is. If you were having a problem with the transition to the AlwaysVisible interface, I would have assumed WD14-tagger would have broken too. This is strange, but since it could be a bug, let's troubleshoot.

Troubleshooting Explanations:

SmirkingKitsune commented 2 months ago

Okay, pushed out three commits, first one was what I was working on and print statements about finding extensions, 2nd one was a refresh button to force the model_selection selector to refresh. And the last was to fix a syntax error that was introduced with the refresh button. Here it is: https://github.com/Alvi-alvarez/sd-Img2img-batch-interrogator/pull/10/commits/56f128ece8aa67f9683c11c1651199279c5ca044 It might fix your problem, or at the very least help with troubleshooting...