CodeExplode / stable-diffusion-webui-embedding-editor

Embedding editor extension for web ui
70 stars 13 forks source link

Could there be a button to align all sliders to a keyword? #7

Open ARandomUserFromGithub opened 1 year ago

ARandomUserFromGithub commented 1 year ago

It's a lot of clicking to imprecisely adjust sliders along the colours matching keywords. It could be done in 1 click with a proper UI function.

It could also be possible to align the slider between 2 different colours, and make it 80% or 90% closer to X colour for fine tuning.

Example, making the slider on 90% "hand" and 10% "fingers" so it hopefully doesn't overlap with feet and toes.

rockerBOO commented 1 year ago

IMO seems possible, but right now the guidance is only used to show the markers via JavaScript. Possibly something that could be operated on via JavaScript or a button operator to trigger something inside the Python code.

I personally would be interested in this as well, as going through multiple layers and tuning it seems impractical. I will consider giving it an attempt if I have some free time.

My question would be if it makes sense to do this automatically, since it might not always work well. Some weights might have more importance, so changing a lot of them at the same time might produce poor results.

ARandomUserFromGithub commented 1 year ago

Maybe it could offer to do it gradually, but precisely, one slider at a time when fine tuning. An option to visualise roughly what noises are contained under each slider would be interesting.

rockerBOO commented 1 year ago

What do you mean by visualize the noises?

ARandomUserFromGithub commented 1 year ago

Noises must have some kind of a visible matrix to them, a bit like TV noise, with some visible shapes and patterns and deformations that could indicate what is present at X depth in the selected section of the noise array. Browsing that noise to have a rough view of what's there could help confirm if the slider is well adjusted.

rockerBOO commented 1 year ago

Hmm, that does sound interesting. I'm still learning about embeddings. Any examples of this noise elsewhere?


In my mind, I'm trying to figure out why would someone make an embedding through like training vs moving these sliders around. Training seems to maybe get to the result eventually, but have some side effects of the model it's trained on. If the sliders were automated between 2 points, why would it need to have such an expansive list of sliders. Could happen just as a product of a function.

Of course, this depends on the sliders being something that can be represented by tokens vs adding something fundamentally new as an embedding, so maybe that's the difference. Trying to equate something that's somewhat already in the system, but in between.

ARandomUserFromGithub commented 1 year ago

There's 768 sliders per vectors. I tried to make an embedding for hands, and highlighted "toe" "toes" "finger" fingers" "hand" "foot" and tried to align sliders away from "foot" or "toes", hoping I would get something that looks more like hands... Tho it failed badly... I'm not even sure if these sliders even do something.

illtellyoulater commented 1 year ago

If someone wants to experiment with this I've put together a little JS function which will automatically set all the sliders to their guidance values. It can be used with a single keyword for now but it could be easily expanded. It works by calculating the value of the markers by mapping their 0 to 100 value (used in linear-gradient css property) to the correspondent value in the weights scale ranging from their min to max values, using linear interpolation to make this conversion.

It would have been more sensed to just get the guidance values in JSON from gradio, and update the sliders accordingly, but I am not familiar with gradio, so I just made this quick and dirty test...

Just enter a single keyword in the extension UI, get its guidance values by pushing the button, and then run this to update the sliders values.

// linear interpolation scaling
function convertRange( value, r1, r2 ) { 
    return ( value - r1[ 0 ] ) * ( r2[ 1 ] - r2[ 0 ] ) / ( r1[ 1 ] - r1[ 0 ] ) + r2[ 0 ];
}

function setSlidersToGuidance() {
    gradioApp().querySelectorAll(`[id^=embedding_editor_weight_slider_]`).forEach(function(slider) {
        let weight = slider.querySelector('input');
        // <input type="number" class="gr-box gr-input gr-text-input text-center h-6" min="-0.06793212890625" max="0.058197021484375" step="any">
        let guidance_bar = slider.querySelector('.embedding_editor_guidance_bar');    
        // Getting the marker position (the RGB percentage value in linear-gradient background css property) 
        let markercss = guidance_bar.style.background.match(/rgb\(.*?\).*?%/g);
        // ['rgb(255, 0, 0) 45.5642%', 'rgb(255, 0, 0) 47.5642%'] 
        // The 2 values corrispond to a single marker (just to look thicker in the UI). Taking first one +1 (46.5642)
        let markerval = parseFloat(markercss[0].split(")")[1].replace(" ", "").replace("%", ""))+1;
        // 46.5642, we convert this 0 to 100 value to a correspondant value on the weight min-max scale
        let markervalScaled = convertRange(markerval, [0, 100], [parseFloat(weight.min), parseFloat(weight.max)]);
        // 0.0010836791992187497
        weight.value = markervalScaled;
    });
}

setSlidersToGuidance();

The sliders position is not actually changed but their values is (don't know how to make the UI refresh). After done, save the embedding and proceed to next keyword or vector....

ARandomUserFromGithub commented 1 year ago

Awesome! Just one question... How is that script meant to be ran? With a JS injection extension of some sort? Or by editing a file in SD?

rockerBOO commented 1 year ago

Looks like you'd need to run the code in your browser manually. See https://developer.chrome.com/docs/devtools/console/javascript/

The 2 functions can be run once (will remain available), and then you would need to run the function each time you change the token.

Then you can run the following in the browser console.

setSlidersToGuidance()
illtellyoulater commented 1 year ago

Yes exactly... just copy-paste everything in the console, go the UI and work as usual, adding one token and pressing the button to get its guidance values, then just go back to the console and run setSlidersToGuidance().
At this point the values are extracted from the guidance strips and stored into the sliders value property. Hit the "save embedding" and you should be done.

But since I have almost zero experience with gradio and I have never actually examined how data is stored inside an embedding file, it would be wise if one of you guys could just make sure the values are actually retained in it after pressing the Save button...

Zetaphor commented 1 year ago

I'm exploring the possibilities of this extension in my fork, in addition to trying to improve the UI/UX. I've implemented this alignment feature in that version https://github.com/Zetaphor/stable-diffusion-webui-embedding-editor

ARandomUserFromGithub commented 1 year ago

I have read into your description. If each slider had a display of the selected word at the position of the slider, it could help understand better what's going on and why the AI selected these noises while training from whatever data it's been trained on.

Zetaphor commented 1 year ago

I have read into your description. If each slider had a display of the selected word at the position of the slider, it could help understand better what's going on and why the AI selected these noises while training from whatever data it's been trained on.

I'm currently working on that now. I've been able to determine that for a given version of a base model (so SD 1.x or SD 2.x), any specific token that the model knows has a static weight, regardless of whether the model was mixed or fine tuned. I'm currently pickling an index and am going to see about doing similarity checks on each weight to try and generate a list of similar tokens for any given weights value.

Zetaphor commented 1 year ago

I've attempted an implementation of this but I'm not sure how effective it actually is. The method I'm seeing used in the Embeddings Inspector plugin is to do a consineSimilarity search to find similar tokens. The issue here is this assumes two sets of embeddings rather than a single float for a specific weight against a 1D list of weights for all other token weights at that weight index.

The current implementation is doing an absolute distance check rather than taking into account the direction in latent space. This is giving what appear to be undesirable results.

I'm not experienced enough in this stuff to know how to resolve this particular issue. If anyone wants to take a look, here is my implementation:

https://github.com/Zetaphor/stable-diffusion-webui-embedding-editor/blob/eab4141e8e0a19ecb78d11ef27ce2701893f1b91/scripts/embedding_editor.py#L472

Zetaphor commented 1 year ago

Update: After a lot of trial, error, testing, and validation, I've managed to find a fairly accurate method of doing this.

Since I want to be able to run this in real-time I'm precomputing all of the values for every weight through the entire min/max range for that specific weight. It should be done in approximately 8 hours.

I've already build a precomputed index of the weight values for every token in the Stable Diffusion vocabulary, so between these two indexes I should have everything I need to make this happen.

The functionality to create both pickled indexes is implemented in my branch and only requires a button click

illtellyoulater commented 1 year ago

@Zetaphor has there been any progress on this that can be merged?

Zetaphor commented 1 year ago

@illtellyoulater Yes and no, it's still very much in the experimental stage. I've done a precomputation at a precision of 0.001 but I found that was not accurate enough.

That took 8 hours to perform (16 after an initial failed attempt). Since that time I've been preparing to move out of the country so I haven't had the time yet to prepare a new run at a precision of 0.00001 or higher, which will take at least 3 or 4 days.

If someone wants to spare some compute and run that I could easily pick up the work as all of the functions for doing the distance checks are already in place, they just need final wiring and polish, and then a UI implementation which I've also already started.

All of my work should be fairly easy to grok in the fork, everything is broken into distinct functions and there's even buttons to run all of it. If nobody else wants to pick up the mantle it will have to wait until I've moved and settled in before I can dedicate any serious time to it again.