filipstrand / mflux

A MLX port of FLUX based on the Huggingface Diffusers implementation.
MIT License
1.01k stars 60 forks source link

Information Sharing on the rf_inversion Technology #91

Open raysers opened 2 weeks ago

raysers commented 2 weeks ago

Could I share a new piece of technology information? Recently, I discovered an interesting project called Fluxtapoz:

https://github.com/logtd/ComfyUI-Fluxtapoz

Its initial platform is ComfyUI and is currently only compatible with the non-mlx standard version of flux. However, some enthusiasts have ported it to run within Diffusers:

https://github.com/raven38/rf_inversion

Thus, it may still be possible to adapt it for mflux compatibility.

This technology appears to be distinct from ControlNet and ipadapter. It is rumored that it doesn’t require a model; it can achieve consistent style transfer, style imitation, and similar functions solely through prompt inputs.

However:

At the moment, I don’t have the necessary hardware to run standard flux, so I haven’t tested this technology personally. All my information comes from the official demonstrations and some blogger reviews.

Additionally, I’ve noticed that the current mflux development team seems stretched thin, with only @filipstrand and @anthonywu actively contributing on a regular basis at the moment. (There have been other active contributors in the past, but right now it’s just these two.) Since mlx is relatively new, there are very few who are truly proficient with it. This raises the bar for contributors and understandably limits the available manpower. Therefore, I don’t want to add to the team’s workload by suggesting they put this on their to-do list (which already seems to have an extensive backlog).

I’m merely sharing this brief overview of the technology as a piece of information. The reason for sharing: I think that, given the rapid iteration of new technologies, this could be a short-lived innovation, but it’s also possible that it might become a game-changing technology, alongside ControlNet and ipadapter.

If it turns out to be the latter, then sharing this information will have been worthwhile. Who knows—maybe this mention might even have a bit of foresight? And if one day this technology sparks something remarkable when combined with mflux, wouldn’t that be exciting? Anything is possible, I believe.

filipstrand commented 2 weeks ago

@raysers Really cool suggestion, thanks for bringing it up here! A year ago I experimented a bit with null text inversion in the Diffusers codebase and that was actually my first attempt to implement new research with diffusion models (fun times!) This looks like something similar and probably even better. I'll put this on the TODO list as something to further investigate when there is time.

raysers commented 2 weeks ago

Thank you, @filipstrand 大佬—thank you for your interest in this feature. I only intended to share it, so it was a delightful surprise to hear you might add it to the TODO list. As you mentioned, it’s still something that requires further investigation, and I fully agree. Perhaps, as the official team focuses on implementing higher-priority features, time will also naturally test the practicality of this technology. If it eventually proves suitable for MFLUX and is implemented, that would indeed be another pleasant surprise.