filipstrand / mflux

A MLX port of FLUX based on the Huggingface Diffusers implementation.
MIT License
739 stars 53 forks source link

proposal: automatic output file naming #80

Open anthonywu opened 1 day ago

anthonywu commented 1 day ago

Currently the pattern for automatic image naming is image.* and the subsequent images are smartly indexed as image-N.* etc.

However, I think we can make some improvements:

  1. always add the dimensions to the image name like image-512x512.png
  2. automatically summarize the prompt into a file name - so if you had a prompt about a dog playing in a park, the filename can be summarized by some tool as dog-play-in-park.*. The tool can be something traditional/fast like nltk or some other modern embedding/ranking tool for finding the most relevant keywords.
  3. allow users to opt-in to automatic placeholders such as
    • {iso_date} ISO dateyyyy-mm-dd`
    • {unix_timestamp} via date +%s or python str fmt
    • {seed}
    • other placeholders that represent the various params: guidance, quantize, etc

not trying to scope creep on what we support - just enough that every file name can be reasonably expected to be unique (seed and prompt summary, at the minimum)

We can also make a API for a OutputFileNamer - we'll provide reasonable defaults, and any users of the library can inject their own customization as needed.

azrahello commented 1 day ago

I would like to ask if it is possible to embed the prompt as EXIF data in a PNG file, so that when the image is imported into photographic software, the prompt appears as a description. In general, could the JSON created with the –metadata option also be embedded as EXIF data?

filipstrand commented 1 day ago

@azrahello We do export this information with the created png file since a while back (even if you do not include the -metadata flag). For example, for an image image.png generated by mflux, if you run

exiftool image.png

then you will see this kind of information (including the prompt):

File Type                       : PNG
File Type Extension             : png
MIME Type                       : image/png
Image Width                     : 1024
Image Height                    : 1024
Bit Depth                       : 8
Color Type                      : RGB
Compression                     : Deflate/Inflate
Filter                          : Adaptive
Interlace                       : Noninterlaced
Exif Byte Order                 : Big-endian (Motorola, MM)
Warning                         : Invalid EXIF text encoding for UserComment
User Comment                    : {'mflux_version': '0.2.1', 'model': 'schnell', 'seed': '1728244816', 'steps': '6', 'guidance': 'None', 'precision': 'mlx.core.bfloat16', 'quantization': 'None', 'generation_time': '610.10 seconds', 'lora_paths': 'None', 'lora_scales': 'None', 'prompt': 'blue bird', 'controlnet_image': 'None', 'controlnet_strength': 'None'}
Image Size                      : 1024x1024
Megapixels                      : 1.0

I did not spend too much time on the this feature, and there are probably better ways to structure this information so that it can be read by various image applications. E.g would be nice if it was shown in the info section in macOS etc (right now it is not shown)

Screenshot 2024-10-19 at 15 09 52
filipstrand commented 1 day ago

@anthonywu I like your suggestions here