alt-key-project / comfyui-dream-project

Animation supporting nodes for ComfyUI
MIT License
73 stars 7 forks source link

FFMpeg issues #1

Open fatualux opened 10 months ago

fatualux commented 10 months ago

Hello, and congratulations for this wonderful extension. I discovered it two days ago... I'm trying to test its potential and it really surprised me; So thank you very much for your precious work.

Unfortunately, however, the process stops while the movie is being rendered.

Despite being a complete novice in this area, I tried to take a look at the scripts included in the node folder, in particular seq_processing.py, where I found the call to the ffmpeg command.

In fact, once executed, the command returns the ffmpeg help page as output, as if there had been some error in the formulation of the command.

I don't know if it is possible to activate a debug mode, to print the commands executed, in order to obtain additional information on the type of error that occurred.

All images are generated correctly, the error occurs only for the FFMpeg command.

6.4.12-arch1-1 x86_64 + i3-wm

ComfyUI running in a Python Virtual Environment

If you need further information, I will be happy to provide it to you.

Thanking you in advance for your support, I renew my compliments for your work. photo_2023-09-07_16-01-27

https://github.com/alt-key-project/comfyui-dream-project/assets/35587292/70fdca59-c03a-4ded-9c12-927d1fe09475

alt-key-project commented 10 months ago

Thank you for raising this issue with me! The bad news is that it might(?) be tricky for me to solve completely, as the problem is that the node simply calls out to whatever version of FFMPEG that exists on your system and they may not all accept the same arguments. The good news is that I did expect issues with this, so it is likely fixable in the configuration file. Have a look at config.json and the arguments under ffmpeg - you can try to change these to something that works on your system. I will add some documentation in the readme regarding this. Do let me know what argument caused issues for you, maybe I can find a default configuration that is less likely to cause issues.

fatualux commented 10 months ago

Thanks for the prompt response. As advised, I tried taking a look at the config.json file, and changing the options to make it work on my software configuration, unfortunately in vain.

ffmpeg version n6.0.

In my case the following command works fine:

ffmpeg -f concat -safe 0 -i <(for f in ./*.jpg; do echo "file '$PWD/$f'"; done) -shortest -c:v libx264 -r 1 -pix_fmt yuv420p -timestamp now -c copy output.mp4 Ten I edited onfig.json:

{
  "ffmpeg": {
    "path": "ffmpeg",
    "arguments": [
      "-r",
      "%FPS%",
      "-f",
      "concat",
      "-safe",
      "0",
      "-i",
      "%FRAMES%",
      "-c:v",
      "libx264",
      "-pix_fmt",
      "yuv420p",
      "-c",
      "copy",
      "%OUTPUT%"
    ]
  },
  "encoding": {
    "jpeg_quality": 95

Since that didn't work, I tried disabling auto-deletion of images, then I ran the command, and FFmpeg created the video.created the video.

Unfortunately I just can't figure out where I'm going wrong

NeedsMoar commented 6 months ago

In your config file you have the framerate before the concat / safe / image input commands, whereas in your command line you have it after. The version before should specify the framerate of the input video and the one after specifies the output. With image files it isn't supposed to matter and I usually just put it before, but maybe your version is being picky because you're feeding it a file list in a virtual file with FOR instead of a numbered sequence that it can treat more like a video sequence. Concat implies some odd things about files (like them being complete by themselves) and was meant for sticking two videos of the same format together. It's pretty broken even for that use TBH.

The -c:v command is already specifying your codec and what the output video should be, I don't know what the full effects of -c copy will be on top of that. Technically JPEG frames slapped together back-to-back are just an MJPEG when stuck in an MP4 file as a video stream. You might check the size of the video that worked. There's a chance it's copying the concatenated JPEGs as a secondary video stream into an MP4 with an encoded H.264 video. ffmpeg is pretty literal when interpreting commands.

FYI if your files are a numbered sequence with padding / prefixes, you don't need a FOR command to load them.
-i Prefix_%06d.jpeg will input all files with a number padded to 6 digits in order (the full sequence has to exist with increasing digits except anything before -start_number or after <frame + length> (-t ). Gaps in numbering will error it out because it's assuming an image sequence from a workflow like Houdini or Nuke. Default numbering in comfy handles this just fine and since modern codecs with motion prediction / b-frames / reference frames shouldn't be encoded in chunks except by tools that know how to deal with it and split at IDR frames which act as a barrier for motion prediction lookahead / lookbehind (like the AOM versions of x265) anyway you should probably do it that way and avoid the whole bugginess of the concat "decoder" to begin with. If you have a computer with more than 6 cores you can probably encode the resolutions comfy outputs in H.265 just as quickly as in H.264 and get better results, too. H.264 hasn't been actively optimized for a long time.

So it would look like ffmpeg -startnumber 1 -r fps -i ComfySequence%06d.jpeg -pix_fmt yuv420p -c:v libx264 -crf 18 Output.mp4

If you're on an nVidia card you should probably consider using the nvidia hardware encoder instead. Along with Intel they're considered to be almost identical in quality to software for H.264 at this point. If I ever encoded H.264 I'd probably use it, but H.265 hardware is missing the features like asymmetric motion partitions that can turn it from ok to amazing as a codec in general. AV1 is slower in hardware than H.265 on the fast preset is in hardware and can't decode without stuttering on two of my devices with hardware support so it's garbage as far as I'm concerned. Anyway it's already enabled in your build so just changing so instead of the medium quality from libx264 that'll probably encode at around 30-40fps max depending on processor, do something like this (I Looped a sequence of 250 frames 10 times so it could give an appropriate speed to me without overhead completely killing it):

ffmpeg -start_number 1 -r 12 -i Mickey\%06d.png -filter loop=loop=10:size=250:start=0,zscale=rin=full:pin=709:tin=iec61966-2-1:min=gbr:m=709:t=709:p=709:r=limited -pix_fmt yuv420p -c:v h264_nvenc -preset p5 -tune hq -b:v 5M -bufsize 10M -maxrate 10M -qmin 0 -bf 3 -b_ref_mode middle -temporal-aq 1 -spatial-aq 1 -rc vbr -rc-lookahead 20 -i_qfactor 0.75 -b_qfactor 1.1 output.mp4

The zscale part of the filter turns out to be necessary because ffmpeg will leave the color matrix unset in the output otherwise and who knows what it's doing to convert the input RGB... that may cause issues on smart TVs so it's better to throw it in there and never have to type it again, but it can be left out and obviously you don't need the loop part before zscale at all unless you want to loop frames. Anyway, that encoded a total of 2752 frames on a 4090 at 41.4x realtime or 500 frames/s for a 1440x960 input sequence of lossless .png files. Not bad for the resolution and the ultra HQ encode settings.

It might be possible to bypass ffmpeg completely and use the python CUDA interface to access the video encoder and just feed it the raw image data so it never has to leave the card, it probably takes more time to save one JPEG to disk than nvenc takes to encode a whole video if all the data is already there for it.

fatualux commented 6 months ago

In your config file you have the framerate before the concat / safe / image input commands, whereas in your command line you have it after. The version before should specify the framerate of the input video and the one after specifies the output. With image files it isn't supposed to matter and I usually just put it before, but maybe your version is being picky because you're feeding it a file list in a virtual file with FOR instead of a numbered sequence that it can treat more like a video sequence. Concat implies some odd things about files (like them being complete by themselves) and was meant for sticking two videos of the same format together. It's pretty broken even for that use TBH.

The -c:v command is already specifying your codec and what the output video should be, I don't know what the full effects of -c copy will be on top of that. Technically JPEG frames slapped together back-to-back are just an MJPEG when stuck in an MP4 file as a video stream. You might check the size of the video that worked. There's a chance it's copying the concatenated JPEGs as a secondary video stream into an MP4 with an encoded H.264 video. ffmpeg is pretty literal when interpreting commands.

FYI if your files are a numbered sequence with padding / prefixes, you don't need a FOR command to load them. -i Prefix_%06d.jpeg will input all files with a number padded to 6 digits in order (the full sequence has to exist with increasing digits except anything before -start_number or after <frame + length> (-t ). Gaps in numbering will error it out because it's assuming an image sequence from a workflow like Houdini or Nuke. Default numbering in comfy handles this just fine and since modern codecs with motion prediction / b-frames / reference frames shouldn't be encoded in chunks except by tools that know how to deal with it and split at IDR frames which act as a barrier for motion prediction lookahead / lookbehind (like the AOM versions of x265) anyway you should probably do it that way and avoid the whole bugginess of the concat "decoder" to begin with. If you have a computer with more than 6 cores you can probably encode the resolutions comfy outputs in H.265 just as quickly as in H.264 and get better results, too. H.264 hasn't been actively optimized for a long time.

So it would look like ffmpeg -startnumber 1 -r fps -i ComfySequence%06d.jpeg -pix_fmt yuv420p -c:v libx264 -crf 18 Output.mp4

If you're on an nVidia card you should probably consider using the nvidia hardware encoder instead. Along with Intel they're considered to be almost identical in quality to software for H.264 at this point. If I ever encoded H.264 I'd probably use it, but H.265 hardware is missing the features like asymmetric motion partitions that can turn it from ok to amazing as a codec in general. AV1 is slower in hardware than H.265 on the fast preset is in hardware and can't decode without stuttering on two of my devices with hardware support so it's garbage as far as I'm concerned. Anyway it's already enabled in your build so just changing so instead of the medium quality from libx264 that'll probably encode at around 30-40fps max depending on processor, do something like this (I Looped a sequence of 250 frames 10 times so it could give an appropriate speed to me without overhead completely killing it):

ffmpeg -start_number 1 -r 12 -i Mickey%06d.png -filter loop=loop=10:size=250:start=0,zscale=rin=full:pin=709:tin=iec61966-2-1:min=gbr:m=709:t=709:p=709:r=limited -pix_fmt yuv420p -c:v h264_nvenc -preset p5 -tune hq -b:v 5M -bufsize 10M -maxrate 10M -qmin 0 -bf 3 -b_ref_mode middle -temporal-aq 1 -spatial-aq 1 -rc vbr -rc-lookahead 20 -i_qfactor 0.75 -b_qfactor 1.1 output.mp4

The zscale part of the filter turns out to be necessary because ffmpeg will leave the color matrix unset in the output otherwise and who knows what it's doing to convert the input RGB... that may cause issues on smart TVs so it's better to throw it in there and never have to type it again, but it can be left out and obviously you don't need the loop part before zscale at all unless you want to loop frames. Anyway, that encoded a total of 2752 frames on a 4090 at 41.4x realtime or 500 frames/s for a 1440x960 input sequence of lossless .png files. Not bad for the resolution and the ultra HQ encode settings.

It might be possible to bypass ffmpeg completely and use the python CUDA interface to access the video encoder and just feed it the raw image data so it never has to leave the card, it probably takes more time to save one JPEG to disk than nvenc takes to encode a whole video if all the data is already there for it.

thanks a lot, I will check it in.

for now, I wrote a script based on ffmpeg to create a video using the generated images. I know it is a bad workaround, but I am a noob and I have too little time to study... thanks again for your support