Add support for Segment Anything (SAM)

moorehousew commented 1 year ago

This implements part of #32, namely support for loading SAM and generating masks using it. It's a bit rough around the edges though, so I'll keep working on it:

Not really sure whether most users will have enough VRAM to run the model(s) on their GPU if SD is loaded. Maybe add a cpu/gpu flag in either the model loader or image mask node?
I noticed the BLIP download URL can be configured, should do the same for SAM.
SAM also supports being passed a bounding box to further refine what you're trying to get it to mask, but I haven't implemented that yet.
Would also like to add cropping and blend nodes that can use these masks along with a padding amount to enable crop-upscale-diffuse-downscale-blend workflows.

Here's what the two nodes in a workflow look like- you give SAM an arbitrary list of points, and whether the point should be "in" the mask or "outside" of it- and it does the rest: sam-test

Let me know if you have any feedback, and I'll continue working on it.

WASasquatch commented 1 year ago

It would be nice if there was some sort of priority with nodes, like pin a node to run first, so it runs, clears it's bits, then diffusion models, etc, load up to do the diffusion pipeline process. I know sometimes you can get image loaders to run and process first if fed into diffusion process, but yeah.

I'm assuming points are X/Y coordinates? By arbitrary, meaning it doesn't really matter so long as you have enough points defining what you want in the mask? You say whether it's in the mask or not, but I don't see that being specifically defined? Or is that what the list is in the second field?

I'll give this a test in a bit. Hectic day today, I can't seem to sit down for more then 10 minutes.

moorehousew commented 1 year ago

Yeah, so if you pick a few points from anywhere on the image and classify them as part of the segment to mask or not, SAM infers what the segment you're looking for is and masks it. The second list is whether each point is part of the segment to mask or otherwise. So for the example, the first point is from the face, the other three aren't- and the one from the face is indicated as part of the segment I want, while the other three aren't. SAM does the rest.

Facebook will be releasing the text-portion of the model at a later date, which simplifies the process to just typing in "face" and SAM doing the rest. Someone's attempting something similar using CLIP right now (https://github.com/Curt-Park/segment-anything-with-clip), so I'll look into that and see if it can't be added as part of the PR.

WASasquatch commented 1 year ago

Yeah, so if you pick a few points from anywhere on the image and classify them as part of the segment to mask or not, SAM infers what the segment you're looking for is and masks it. The second list is whether each point is part of the segment to mask or otherwise. So for the example, the first point is from the face, the other three aren't- and the one from the face is indicated as part of the segment I want, while the other three aren't. SAM does the rest.

I should have paid attention the element count in the second field. Lol Thanks for the clarification.

Facebook will be releasing the text-portion of the model at a later date, which simplifies the process to just typing in "face" and SAM doing the rest. Someone's attempting something similar using CLIP right now (https://github.com/Curt-Park/segment-anything-with-clip), so I'll look into that and see if it can't be added as part of the PR.

I saw that, the prompt based one, which is only available on their demo currently. That would definitely make the process easier, though I think an advanced node is worth keeping considering advanced workflows.

WASasquatch commented 1 year ago

So I think there is some errors that need resolving.

On a fresh install, it tries to access a variable which hasn't been defined, since it looks like you switched to a URL mapping.

r = requests.get(model_url, allow_redirects=True)

NameError: name 'model_url' is not defined

Also, you can add the default URLs to the was_conf_template dictionary, which is used to create/update the config file. Then you can use getSuiteConfig() to get the conf dictionary to access the default URL or customized URL. Then if it doesn't exist, you can fall back to the model_url_mapping dict.

WASasquatch commented 1 year ago

Were you going to update your PR @moorehousew ?

moorehousew commented 1 year ago

Yes, I've just been busy this weekend and haven't had a chance to finalize my changes.

moorehousew commented 1 year ago

Alright, I've made the changes to the model fetching logic you mentioned, as well as added a few more nodes for positional cropping and blending, with and without a mask. This is useful when you're resizing images in a pipeline (ex. for feature-specific diffusion) but you need the blend to be done at the original crop bounds in the target image.

Here's an example of what's possible now. I'm taking the output of a diffusion process that doesn't give a lot of latent space to the face, and cropping, re-diffusing, and blending to fix things: sam-test-2

I think that's everything. If you have any further comments or changes, let me know. I also attempted to get CLIP to work with SAM for text-based masking, but I gave up at calculating the similarity due to a mismatch in tensor size. I might try another crack at it, or you can try to do so yourself.

WASasquatch commented 1 year ago

Alright, I've made the changes to the model fetching logic you mentioned, as well as added a few more nodes for positional cropping and blending, with and without a mask. This is useful when you're resizing images in a pipeline (ex. for feature-specific diffusion) but you need the blend to be done at the original crop bounds in the target image.

Here's an example of what's possible now. I'm taking the output of a diffusion process that doesn't give a lot of latent space to the face, and cropping, re-diffusing, and blending to fix things:

I think that's everything. If you have any further comments or changes, let me know. I also attempted to get CLIP to work with SAM for text-based masking, but I gave up at calculating the similarity due to a mismatch in tensor size. I might try another crack at it, or you can try to do so yourself.

It's ironic you add the padding node that does the right borders, I was literally doing that last night (as I made a stitching node too and wanted to go back and improve padding.

Tensor mismatches are a pain. Hate dealing with Tensors.

I'll check out new version and hopefully get this merged.

WASasquatch commented 1 year ago

Can you upload your workflow? Having trouble setting it up correctly

moorehousew commented 1 year ago

Sure thing. https://gist.github.com/moorehousew/e5896ed79ba83cb1379d73e3faaeab8c

WASasquatch commented 1 year ago

Sure thing. https://gist.github.com/moorehousew/e5896ed79ba83cb1379d73e3faaeab8c

Thanks! Turns out I was just being silly. I didn't follow connection paths well in the image. Lol

Waiting on model download. My poor 4G. Lol

WASasquatch commented 1 year ago

Alright, so everything seems to be working for me, the only issue is the debug printing to the console. That can remain if it's useful, but should be labeled as coming from WAS NS, as well as what the debug is for such as "SAM Result Tensor" or something.

moorehousew commented 1 year ago

Stick with ViT-H if your internet is slow :p should be just fine anyways. I'll remove the debug printing since the node is done now.

WASasquatch commented 1 year ago

Awesome. Let's merge.

WASasquatch / was-node-suite-comfyui

Add support for Segment Anything (SAM) #33