dnl13 / ComfyUI-dnl13-seg

I'm working on enabling SAM-HQ and Dino for ComfyUI to easily generate masks automatically, either through automation or prompts.
18 stars 3 forks source link

Partial incompatibility with Apple MPS #6

Open alessandroperilli opened 11 months ago

alessandroperilli commented 11 months ago

Your fork of the segment anything node was working flawlessly. Instead, this new node refuses to work in the same identical workflow:

Screenshot 2023-11-21 at 17 25 53

The full log is:

File "xyz/ComfyUI/execution.py", line 153, in recursive_execute output_data, output_ui = get_output_data(obj, input_data_all) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "xyz/ComfyUI/execution.py", line 83, in get_output_data return_values = map_node_over_list(obj, input_data_all, obj.FUNCTION, allow_interrupt=True) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "xyz/ComfyUI/execution.py", line 76, in map_node_over_list results.append(getattr(obj, func)(*slice_dict(input_data_all, i))) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "xyz/ComfyUI/custom_nodes/ComfyUI-dnl13-seg/nodes/node_dinosam_prompt.py", line 135, in main (images, masks) = sam_segment( ^^^^^^^^^^^^ File "xyz/ComfyUI/custom_nodes/ComfyUI-dnl13-seg/node_fnc/node_dinosam_prompt.py", line 221, in sam_segment predictor.set_image(image_np_rgb) File "xyz/ComfyUI/venv/lib/python3.11/site-packages/segment_anything_hq/predictor.py", line 62, in set_image self.set_torch_image(input_image_torch, image.shape[:2]) File "xyz/ComfyUI/venv/lib/python3.11/site-packages/torch/utils/_contextlib.py", line 115, in decorate_context return func(args, **kwargs) ^^^^^^^^^^^^^^^^^^^^^ File "xyz/ComfyUI/venv/lib/python3.11/site-packages/segment_anything_hq/predictor.py", line 91, in set_torch_image self.features, self.interm_features = self.model.image_encoder(input_image)

I thought the problem could be that this node is stricter and wants your SAM Loader rather than 3rd parties. Hence, I opened #4.

I also thought the problem could be related to #5.

Other than not working at all, this node looks very promising with many features that I really hoped to see!

dnl13 commented 11 months ago

I did indeed introduce some restrictions, mainly affecting the Dino model, as I plan to test and incorporate the DinoV2 models in a test branch later on. For the SAM model, this should only be the case if these nodes exclusively support HQ models.

Is there a need for non-HQ models? If so, I can make adjustments, although I personally believe that the mask quality, even with HQ models, isn't always very clean.

alessandroperilli commented 11 months ago

This is now working but the performance is very...inconsistent. There are a lot of strange things going on.

  1. The SwinB model (which I assumed being more accurate than the SwinT_OGC model) doesn't detect anything. Or better, I think it does but the Mask with prompt node doesn't show it.

In the example below, I asked to detect the eyes, and the terminal shows a confidence of 0.57: (dnl13-seg) > phrase(confidence): ['eyes(0.57)']

However, the output is this:

Screenshot 2023-11-21 at 22 05 34

Same test with the SwinT_OGC model:

Screenshot 2023-11-21 at 22 08 52
  1. Even if I keep the SwinT_OGC model, certain entities are not detected properly. The same entities were flawlessly detected and segmented with the non-HQ models and the same SwinT_OGC model in the segment anything node.

For example:

Screenshot 2023-11-21 at 22 11 16

Other entities are recognized without issues, but they are less precisely segmented compared what the segment anything node with non-HQ models used to do:

Screenshot 2023-11-21 at 22 13 53

(the chin is partially cut)

Similarly, in the previous scenario, non-HQ models would segment each eye perfectly. The HQ model, instead, has included the face portion between the eyes. Not sure it's an improvement. I set clean_mask_holes and clean_mask_island parameters to zero, but it made no difference.

  1. There's a very tangible difference between AUTO and CPU. I assumed that, at least in my case, on an Apple system, AUTO would be equal to CPU, but clearly not.

If I manually set CPU, I get a reasonable speed considering these are HQ models, and the weird output shown at the previous point. But if I leave it in AUTO, the detection and segmentation takes double the time, and the results are even weirder:

Screenshot 2023-11-21 at 22 12 23

If you would allow the SAM Model Loader to also load non-HQ models, like segment anything, I could offer an apple-to-apple comparison, but I can already tell you that non-HQ models have generated exceptionally good masks in most of my tests in the past.

I also think that a lot of people with limited RAM would appreciate the option to load non-HQ models.

dnl13 commented 11 months ago

Hm, okay, those are very interesting observations. I just saw that you mainly use the HQ*H model. My tests indeed showed good sometimes even better results with the Base (B) Model or (L) Model in combination with the SwinB for GroundingDino.

Ok, adding the SAM Base models shouldn't really be a problem. I'll put that on my todo list.

Oh, clean_mask_holes and clean_mask_island can take on very large values, as this seems to reflect the pixel density of the mask. 64 as default value is mainly used to remove small parts of the mask. A value of 0 will therefore not make any corrections to the mask.

(*) The stupid thing is that usually the good masks have a lower box_threshold aka confidence than bad masks. maybe i can add a second slider here to better capture the masks. I'll have to test that.

  1. i have an idea that it could be that the masks are not created correctly if multimask is not activated. i will look into this.

  2. please try playing with the box_threshold here. Part of an idea see above please(*). The function structure is as follows: a lower box_threshold lets more detected image features "through". This also means that both eyes may be detected individually and here my mask merging is not yet optimal. A check with multimask definitely helps here to see what it has really found. I urgently need to check the mask creation if multimask is not activated.

  3. The automatic device selection is very interesting. In the Auto variant the comfy.model_management.get_torch_device() is used to find the device. Perhaps it then loads the models into the M2 graphic cores? Next week I will have the opportunity to test this myself on an M2 Mac to get a more accurate picture.

And thanks again for testing! 🙏

Edited: About the eyes https://github.com/dnl13/ComfyUI-dnl13-seg/discussions/2

alessandroperilli commented 11 months ago

I just tried every test in the previous post with the multimask setting on, and it makes no difference. I keep it off because I cannot control the order of the masks (per our previous conversation), and so I have to duplicate the Mask with prompt many times.

I also tried playing with the box threshold setting a lot more, but it makes no difference. The segment anything configuration with non-HQ models didn't require very sensitive settings. A 0.30 would be perfect to find most things.

alessandroperilli commented 11 months ago

Also, the fact that you can recognize the eyes in the second picture of #2 with a SwinB model, and I cannot in my tests above, confirms that there's something strange going on.

Try by yourself and see if you find the eyes in this image I always use as benchmark. It normally works fine with 0.30 confidence threshold.

dnl13 commented 11 months ago

Hey @alessandroperilli

I gave it a try with your image. Yes, it's exactly the same as in my observation in the discussion thread. At 0.30, three eyes are detected with different confidence levels. (dnl13-seg) > phrase(confidence): ['eyes(0.49)', 'eyes(0.50)', 'eyes(0.31)'] Here, too, I had to raise the value to 0.32. This behavior can be observed with all SAM models B/H/L.

eyes1 eyes2

With the GroundingDino SwinT_OGC, I have to set the threshold even higher at 0.40 (also tested with all SAM models B/H/L).

I'll now see how I can reintegrate the non-HQ models and then send another push. I'll mention you again here once the push is through (after testing).

What's very interesting is that in terms of functionality, only the dependencies of this script are managed differently than in the segment_anything fork. That's why some errors surprise me. However, as you mentioned, it could also be related to the models. I'll take care of it. Best regards!

alessandroperilli commented 11 months ago

That would be OK. I don't mind tweaking the threshold to get individual eyes vs a pair. The problem here is that, on my system, as I previously reported, the whole subject gets identified. Not the eyes.

Screenshot 2023-11-22 at 11 35 16 Screenshot 2023-11-22 at 11 37 31 Screenshot 2023-11-22 at 11 37 50 Screenshot 2023-11-22 at 12 48 18 Screenshot 2023-11-22 at 12 48 37

On my system, the detection is fundamentally broken. With every HQ model.

If you can detect the eyes on your system, all other things equal, it has to be related to MPS.

dnl13 commented 11 months ago

Hello @alessandroperilli,

I've made some more changes. Could you please do a git pull and install the new requirements.txt again?

I'm looking forward to your feedback. Thank you, and best regards!

alessandroperilli commented 11 months ago

OK, tests done. Lots to report:

  1. If there was something that the system was supposed to change in terms of packages, it did not. I manually run pip install -r requirements.txt but I already had all requirements satisfied. To be extra sure, I manually deleted the custom node folder and reinstalled it via ComfUI Manager (Install via GIT Url). No difference.

  2. If I leave the switch on AUTO, I simply cannot detect anything remotely resembling the prompt. Whatever happens with the AUTO switch, it completely disrupts the capability to detect the right things on my system.

Screenshot 2023-11-22 at 17 19 59

If I switch to CPU, the system works as intended, but...keep reading about the next two issues.

  1. No matter what, on my system, the SwinB model refuses to detect the right objects. It doesn't work in conjunction with AUTO. It doesn't work in conjunction with CPU. It doesn't work at any lower-bound threshold value. It's useless.
Screenshot 2023-11-22 at 17 21 49
  1. The best-performing model is, BY FAR, the non-HQ sam-vit-b model, exactly like in the segment anything node. The HQ models are completely unreliable, sometimes working and sometimes not.

For example: the non-HQ sam-vit-b model can accurately identify the background, while the HQ counterpart (or the Huge version of the HQ models) fails miserably.

Screenshot 2023-11-22 at 17 11 58 Screenshot 2023-11-22 at 17 11 05

In other words, the only configuration that works for me is the following:

Screenshot 2023-11-22 at 17 16 55

It's good enough for me, as it doesn't change the segment anything scenario. However, given all these other anomalies, I would expect a lot of support requests. In many configurations, the system is not behaving as intended.

dnl13 commented 11 months ago

Okay, I have some news. I just had the opportunity to install ComfyUI on an M2 Mac Mini, and yes, I was able to replicate some of the issues.

So, Apple's MPS is not a viable option. I'm considering implementing a fallback to CPU for MPS until there's some progress on Apple's end or with the models.

Regarding the SwinB model: I couldn't replicate the issues. It runs with both HQ and non-HQ models. Due to time constraints, I've only tested it with the base and HQ-H models, and it worked.

It's possible that yours got damaged during installation?

Since I'll have more opportunities to test with the Mac Mini, I can announce that I'll try to replicate issues and find solutions.

alessandroperilli commented 11 months ago

Not possible, as the node automatically downloaded it yesterday. But, just to double-check, I eliminated both the groundingdino_swinb_cogcoor.pth file and the GroundingDINO_SwinB.cfg file, forcing the node to redownload them again.

With a non-HQ SAM model, the background is recognized, but not the mouth or the eyes:

Screenshot 2023-11-22 at 22 04 28

With a HQ SAM model, nothing is recognized.

So, I'm looking at some hints in the terminal logs.

I see this string just before the node redownloaded SwinB:

final text_encoder_type: bert-base-uncased

Then I see the prediction that the 4 instances of Mask with prompt make as I queue the workflow:

All keys matched successfully (dnl13-seg) > phrase(confidence): ['eyes(0.57)'] (dnl13-seg) > phrase(confidence): ['background(0.54)'] (dnl13-seg) > phrase(confidence): ['mouth(0.71)'] (dnl13-seg) > phrase(confidence): ['sign(0.64)']

I don't know what else it could be. What version of pytorch do you use on ComfyUI venv? I run on the latest nightly build, but that can't be the problem.

Update I uninstalled the nightly build and now I run on the stable PyTorch 2.1.1. Nothing has changed in the behavior of the node.

dnl13 commented 11 months ago

I followed the installation instructions as described on comfyui repo. so I also installed the nightly build for MPS support.

python3 -m pip freeze
accelerate==0.24.1
addict==2.4.0
aiohttp==3.9.0
aiosignal==1.3.1
altgraph @ file:///System/Volumes/Data/SWE/Apps/DT/BuildRoots/BuildRoot11/ActiveBuildRoot/Library/Caches/com.apple.xbs/Sources/python3/python3-141/altgraph-0.17.2-py2.py3-none-any.whl
async-timeout==4.0.3
attrs==23.1.0
certifi==2023.11.17
charset-normalizer==3.3.2
contourpy==1.2.0
cycler==0.12.1
einops==0.7.0
filelock==3.13.1
fonttools==4.45.0
frozenlist==1.4.0
fsspec==2023.10.0
future @ file:///System/Volumes/Data/SWE/Apps/DT/BuildRoots/BuildRoot11/ActiveBuildRoot/Library/Caches/com.apple.xbs/Sources/python3/python3-141/future-0.18.2-py3-none-any.whl
groundingdino-py==0.4.0
huggingface-hub==0.19.4
idna==3.4
importlib-metadata==6.8.0
importlib-resources==6.1.1
Jinja2==3.1.2
kiwisolver==1.4.5
macholib @ file:///System/Volumes/Data/SWE/Apps/DT/BuildRoots/BuildRoot11/ActiveBuildRoot/Library/Caches/com.apple.xbs/Sources/python3/python3-141/macholib-1.15.2-py2.py3-none-any.whl
MarkupSafe==2.1.3
matplotlib==3.8.2
mpmath==1.3.0
multidict==6.0.4
networkx==3.2.1
numpy==1.26.2
opencv-python==4.7.0.72
packaging==23.2
Pillow==10.1.0
platformdirs==4.0.0
psutil==5.9.6
pycocotools==2.0.7
pyparsing==3.1.1
python-dateutil==2.8.2
PyYAML==6.0.1
regex==2023.10.3
requests==2.31.0
safetensors==0.4.0
scipy==1.11.4
segment-anything==1.0
segment-anything-hq==0.3
six @ file:///System/Volumes/Data/SWE/Apps/DT/BuildRoots/BuildRoot11/ActiveBuildRoot/Library/Caches/com.apple.xbs/Sources/python3/python3-141/six-1.15.0-py2.py3-none-any.whl
supervision==0.6.0
sympy==1.12
timm==0.9.11
tokenizers==0.15.0
tomli==2.0.1
torch==2.2.0.dev20231122
torchaudio==2.2.0.dev20231122
torchsde==0.2.6
torchvision==0.17.0.dev20231122
tqdm==4.66.1
trampoline==0.1.2
transformers==4.35.2
typing_extensions==4.8.0
urllib3==2.1.0
yapf==0.40.2
yarl==1.9.3
zipp==3.17.0

This is what is currently installed on the machine.

But be careful, this is a fresh blank installation that only contains my nodes for testing. And it's also running on

$python3 --version
Python 3.9.6`

But it's working.

Update the test file as described on the instructions page for apples:
python3 test.py tensor([1.], device='mps:0')

alessandroperilli commented 11 months ago

After 1h 30min of debugging, I declare defeat.

At this point, our configuration is almost identical. And yet:

Screenshot 2023-11-23 at 00 08 47 Screenshot 2023-11-23 at 00 15 29

I don't really know what else to do.

For the new installations, this message continues to appear in the terminal:

xyz/ComfyUI/venv/lib/python3.11/site-packages/segment_anything_hq/modeling/tiny_vit_sam.py:662: UserWarning: Overwriting tiny_vit_5m_224 in registry with segment_anything_hq.modeling.tiny_vit_sam.tiny_vit_5m_224. This is because the name being registered conflicts with an existing name. Please check if this is not expected. return register_model(fn_wrapper) xyz/ComfyUI/venv/lib/python3.11/site-packages/segment_anything_hq/modeling/tiny_vit_sam.py:662: UserWarning: Overwriting tiny_vit_11m_224 in registry with segment_anything_hq.modeling.tiny_vit_sam.tiny_vit_11m_224. This is because the name being registered conflicts with an existing name. Please check if this is not expected. return register_model(fn_wrapper) xyz/ComfyUI/venv/lib/python3.11/site-packages/segment_anything_hq/modeling/tiny_vit_sam.py:662: UserWarning: Overwriting tiny_vit_21m_224 in registry with segment_anything_hq.modeling.tiny_vit_sam.tiny_vit_21m_224. This is because the name being registered conflicts with an existing name. Please check if this is not expected. return register_model(fn_wrapper) xyz/ComfyUI/venv/lib/python3.11/site-packages/segment_anything_hq/modeling/tiny_vit_sam.py:662: UserWarning: Overwriting tiny_vit_21m_384 in registry with segment_anything_hq.modeling.tiny_vit_sam.tiny_vit_21m_384. This is because the name being registered conflicts with an existing name. Please check if this is not expected. return register_model(fn_wrapper) xyz/ComfyUI/venv/lib/python3.11/site-packages/segment_anything_hq/modeling/tiny_vit_sam.py:662: UserWarning: Overwriting tiny_vit_21m_512 in registry with segment_anything_hq.modeling.tiny_vit_sam.tiny_vit_21m_512. This is because the name being registered conflicts with an existing name. Please check if this is not expected. return register_model(fn_wrapper)

It never appears in my ordinary ComfyUI installation.

dnl13 commented 11 months ago

Okay, that's very interesting! It seems like I'll need another attempt to get it working. I'll test all of this in a different branch. Apologies for any inconvenience. I'm truly grateful for the testing. I've noticed some inconsistencies between the git and pip installations in the issues on the SAM_HQ repository, and the same goes for GroundedDino. It seems like the pip and git repositories are diverging in versions. I'll take a closer look at that. I was hoping to streamline the code in this suite by using pip versions. I'll compare the code and see what needs to be adjusted. It surprised me that in the segment_anything nodes, everything is included in the repo even though there are pip packages. Once again, thank you so much for the tests; it helps me a lot in establishing a stable project here.

alessandroperilli commented 11 months ago

Just to be sure we exclude every scenario: can you share with me the flags you used to run ComfyUI on your macOS? And do a printenv once inside the venv, too? Thank you.

dnl13 commented 11 months ago

image

Okay, now I've encountered the same issue on Windows as well.

There seems to be something peculiar with torchvision and T.RandomResize([800], max_size=1333) in the GroundingDino inference and transform scripts. The images seem to be not presented correctly to the model, which is why it cannot recognize anything. I am working on it now.

I've also made sure to update everything on the Windows machine. I noticed that in my development folder, I was a few versions behind.

Now I need to compare the different implementations and see which one works well with torchvision 0.16. Please bear with me for a little longer.