FID Results on CIFAR10 are worse.

zhuyu-cs commented 1 year ago

Thank you for your GREAT work and for providing the reproduction scripts. However, based on my reproduction, I found that the FID on the CIFAR10 is much worse than 11 in the paper(https://arxiv.org/pdf/2302.00482.pdf), and I actually got 30 (with TargetConditionalFlowMatcher). Any idea about this results?

atong01 commented 1 year ago

Hi, thanks for your interest!

There must be something wrong with the parameters when I copied over. Sorry about that. I will check against what I have. We are also planning on releasing code + model weights (hopefully this week) for CIFAR with improved parameters which give FID ~5-6.

zhuyu-cs commented 1 year ago

Thanks.

kilianFatras commented 1 year ago

I checked your commit over my initial one @atong01. I do not see how your modifications could have impacted the current script. @zhuyu-cs do you mind running the exact OT instead of original FM? I have no idea how FM should work with this script as I have never tested.

atong01 commented 1 year ago

It should work. I just need to check wandb to see differences.

zhuyu-cs commented 1 year ago

I checked your commit over my initial one @atong01. I do not see how your modifications could have impacted the current script. @zhuyu-cs do you mind running the exact OT instead of original FM? I have no idea how FM should work with this script as I have never tested.

yeah, I'm running the exact OT at this time. I'll update the results here when it finished.

kilianFatras commented 1 year ago

@atong01 Yeah I agree, the code is the same. Thank you @zhuyu-cs. As Alex mentioned, we have planned to make a new release this week. Once we have done it, we would be really grateful to know if you are able to reproduce our results (FID-5/6). We will let you know once the update has been done.

kilianFatras commented 1 year ago

We should also release the FID computation. FYI I used the clean-FID library to compute (with tensorflow). https://github.com/GaParmar/clean-fid. I will release a script to compute the FID with clean-FID.

kilianFatras commented 1 year ago

Last question, can you point me to which code you used? Is it the cifar10 code from the example folder or is it the cifar10 code from the runner folder? Thank you.

zhuyu-cs commented 1 year ago

it is example folder.

On Mon, Oct 16, 2023 at 23:52 Kilian @.***> wrote:

Last question, can you point me to which code you used? Is it the cifar10 code from the example folder or is it the cifar10 code from the runner folder? Thank you.

— Reply to this email directly, view it on GitHub https://github.com/atong01/conditional-flow-matching/issues/63#issuecomment-1764785961, or unsubscribe https://github.com/notifications/unsubscribe-auth/AK73Y5BCL4NK4TOP27SGPGTX7VJ3FAVCNFSM6AAAAAA6BS2PEKVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMYTONRUG44DKOJWGE . You are receiving this because you were mentioned.Message ID: @.***>

zhuyu-cs commented 1 year ago

Hi, can you share the training hyperparameters? I am also trying to reproduce it myself. If possible, I'd like to enable a result comparison between the results of my own codebase and your soon-to-be open-sourced code, the difference might only be the training process. Thanks! :)

kilianFatras commented 1 year ago

The training hyper-parameters do not really change. The optimizer will as well as some components of the neural networks. I am currently running the new code to check that everything works fine. I hope to share it on Monday with the inference code.

zhuyu-cs commented 1 year ago

Thanks!

kilianFatras commented 1 year ago

Hello!

We can confirm that our new script works well and reach a FID close to 4.5. The current draft release can be found at https://github.com/atong01/conditional-flow-matching/tree/cifar_10_FID_5/examples/cifar10 with the new script and the FID computation. Please, reinstall the whole dependencies from scratch as we have added some new dependencies. The default parameters should give you an FID close to 4.5/5.

We are making the last checks and cleaning before making our official release. If everything goes well, this should happen by the end of this week.

zhuyu-cs commented 1 year ago

Got it! Thanks for your work! I'll run it with your code and my distributed training code for a double check.

kilianFatras commented 1 year ago

Hello, I have officially added the PR. I need to clean some elements in the README but otherwise, it is ready to use. Would you mind trying to reproduce our results please? We are very close to merge the PR :)

zhuyu-cs commented 1 year ago

yeah, I'm running it, but it still not finished. I will update the FID here if finished.

kilianFatras commented 1 year ago

Thank you! I also would like to point out that we have updated the integration method at inference. We are now using dopri5 to get a FID of 3.5-ish. To get similar results with Euler integrations, you should use 500 steps instead of a 100 (you would get 3.8-ish). With a 100 steps, you should be around 4.5-ish.

zhuyu-cs commented 1 year ago

Great work! Thanks!

zhuyu-cs commented 1 year ago

Thanks for your great work, I'm already in the process of reproducing it. Although not quite finished, the test based on the EMA model training 400000 iters has FID=22.5. I wonder if this is consistent with your experience. If it's different, it could be a difference in environment configuration. My environment is as follows, please refer to it:

python==3.9.18
torch==2.0.1
torchvision==0.15.2
torchdiffeq==0.2.3
matplotlib==3.8.0
numpy==1.26.1
scipy==1.11.3
scikit-learn==1.3.2
timm==0.9.8
torchdyn==1.0.6
pot==0.9.1
torchdiffeq==0.2.3
absl-py==2.0.0
clean-fid==0.1.35

kilianFatras commented 1 year ago

How many iterations is 40w? How do you compute the FID?

with the 100 Euler integration steps from torchdyn (previous version): after a 100000 iterations, the FID should be around 7. After 200000 iterations, it should be around 6. After 300000 iterations it should be 5

zhuyu-cs commented 1 year ago

It is 400000 iterations.

Actually, I'm using a code for accelerated testing which takes about only 30 mins to sample 50000 images (Referenced from https://github.com/NVlabs/edm). The training process is exactly as you provided for single card GPU training and also I use a 2080Ti. And the sampler used is torchdiffeq's dopri5. But the reference feature statistics used for cifar, mean and std, are the clean-fid supplied 'cifar10_legacy_tensorflow_train_32.npz '. So the sampling part is the same as your strategy, just parallelized. Also, the code you provided to test the FID is still running and it takes about 3 hours to finish the sampling.

Thanks for the reference experience! I'll continue to debug the code and update results here.

kilianFatras commented 1 year ago

Unfortunately, I do not know the code you are using and I cannot ensure its faithfulness. The fact that you get such a large FID after 400k iterations is not normal and make me think that the problem is how you compute the FID.

Please from now on, only communicate the results that you are getting from clean-FID. We know it works correctly as it is the code we have used to test our code.

zhuyu-cs commented 1 year ago

OK, thanks for your help!

yuanzhi-zhu commented 1 year ago

@zhuyu-cs would you like to share some sample images with worse FID?

kilianFatras commented 1 year ago

ema_generated_FM_images_step_400000

Here is a sample of generated images after our training with a 100 euler iterations(FID 4.5-ish).

zhuyu-cs commented 1 year ago

Hi, @kilianFatras ,you are right! I use the clean-fid to compute the FID and the FID have reduced to 3.87. That's an interesting finding.

atong01 / conditional-flow-matching

FID Results on CIFAR10 are worse. #63