VinAIResearch / LFM

Official PyTorch implementation of the paper: Flow Matching in Latent Space
https://vinairesearch.github.io/LFM/
GNU Affero General Public License v3.0
175 stars 6 forks source link

Over 450 Generated Images. FID 271.254 . What's Wrong.... #11

Open sumorday opened 5 months ago

sumorday commented 5 months ago
スクリーンショット 2024-03-20 午後5 12 14 スクリーンショット 2024-03-20 午後5 12 27 スクリーンショット 2024-03-20 午後5 13 26

image_epoch_450

Hi! I have generated 450 images, and the facial features are already clear. However, I'm not sure why the FID value is 270. Should I keep only the 'model_450.pth' and 'image_epoch_450.png' checkpoints for testing?

Tôi đã tạo ra 450 hình ảnh, và các đặc điểm khuôn mặt đã rõ ràng. Tuy nhiên, tôi không chắc chắn tại sao giá trị FID lại là 270. Tôi có nên chỉ giữ lại các điểm kiểm tra 'model_450.pth' và 'image_epoch_450.png' để kiểm tra không?

hao-pt commented 5 months ago

I think you misunderstand here. To compute FID, following standard practice, you should generate 50_000 images for statistical significance.

sumorday commented 5 months ago

I think you misunderstand here. To compute FID, following standard practice, you should generate 50_000 images for statistical significance.

スクリーンショット 2024-03-20 午後6 40 07

or

スクリーンショット 2024-03-20 午後7 04 23

Thank you! Does this mean I should change num_epoch from 500 to 50,000? Or where should I set this?

hao-pt commented 5 months ago

50_000 is the number of output images that you use your trained model at epoch 450 model_450.pth to generate, not the number of generated images during training. To test, you need to modify some args in test_args/celeb256_dit.txt like EPOCH_ID, EXP and then run bash_scripts/run_test_ddp.sh test_args/celeb256_dit.txt.

sumorday commented 5 months ago

samples_celeba_256_dopri5_1e-05_1e-05

スクリーンショット 2024-03-21 午後5 56 37

According to the standard procedure, I used the command bash bash_scripts/run_test.sh test_args/celeb256_dit.txt and found that it ultimately generates an image of CelebA, located in the main directory. Is this image supposed to be the final result? However, it did not return an FID value. I noticed that the file test_flow_latent.py automatically samples 50,000 images. Epoch ID 475 represents the approximate appearance of the images at this stage.

スクリーンショット 2024-03-21 午後6 02 16

Theo quy trình tiêu chuẩn, tôi đã sử dụng lệnh bash bash_scripts/run_test.sh test_args/celeb256_dit.txt và phát hiện rằng nó cuối cùng sẽ tạo ra một hình ảnh của CelebA, nằm trong thư mục chính. Liệu hình ảnh này có phải là kết quả cuối cùng không? Tuy nhiên, nó không trả về giá trị FID. Tôi nhận thấy rằng tệp test_flow_latent.py tự động lấy mẫu 50,000 hình ảnh. Epoch ID 475 đại diện cho sự xuất hiện xấp xỉ của các hình ảnh ở giai đoạn này.

sumorday commented 5 months ago

截屏2024-03-25 13 51 01 Using this code python pytorch_fid/fid_score.py ./pytorch_fid/celebahq_stat.npy ./saved_info/latent_flow/celeba_256/celeb_f8_dit will result in inaccurate FID scores. 344.229

However, running the following code as per the GitHub instructions bash bash_scripts/run_test.sh test_args/celeb256_dit.txt directly throws an error indicating missing and unexpected keys in the state dictionary. This typically happens when there is a mismatch between the model architecture and the saved state dictionary.

Any ideas? thank you !

Sử dụng mã này python pytorch_fid/fid_score.py ./pytorch_fid/celebahq_stat.npy ./saved_info/latent_flow/celeba_256/celeb_f8_dit sẽ dẫn đến điểm FID không chính xác. 344.229

Tuy nhiên, chạy mã sau đây theo hướng dẫn trên GitHub bash bash_scripts/run_test.sh test_args/celeb256_dit.txt trực tiếp sẽ gây ra lỗi chỉ ra các khóa bị thiếu và không mong đợi trong từ điển trạng thái. Điều này thường xảy ra khi có sự không phù hợp giữa kiến trúc của mô hình và từ điển trạng thái đã lưu.

Bạn có ý kiến gì không? Cảm ơn bạn!

sumorday commented 5 months ago

截屏2024-03-25 14 06 16

Also, I tried to use the 475.pth file from the original GitHub repository to test, but I found that it only generated one image and couldn't calculate the FID score.

Tôi đã thử sử dụng tệp 475.pth từ kho lưu trữ GitHub gốc để kiểm tra, nhưng tôi thấy rằng nó chỉ tạo ra một hình ảnh và không thể tính điểm FID được.

截屏2024-03-25 14 07 47

xiaweijiexox commented 3 months ago

You‘ve not indicate the appropriate epoch to evaluate the fid of celeba-256-adm. Can you show the number?

xiaweijiexox commented 3 months ago

When I compute the fid with your origin code on dataset celebA-256(unconditional generation) with adm, I find that fid is only 9.21 in 475 epoch. And I sample 50000 images, so I think the output is comparatively accurate. This experiment is not given pth by you, so I don't know what's the problem.

image
quandao10 commented 3 months ago

Hi, I'm understanding that you retrain our model and get 9.21. Is it correct ?

quandao10 commented 3 months ago

Please note that: our stat file is computed using jpg images. If the generated image is png image, it leads to very high fid.

xiaweijiexox commented 3 months ago

I'm sure that I generate jpg images ,because I used your code directly, and I checked that moments ago. Maybe you can provide the pth file, I don't have idea about the concrete epoch to stop, but I'm sure that the outcome of 475 epoch is 9.21.

quandao10 commented 3 months ago

I trained the model for 600 epochs and evaluate at 475 for CelebHQ-256

xiaweijiexox commented 3 months ago

I've found that the model has more fluctuation after 500 epochs(just fids), do you think so?

xiaweijiexox commented 3 months ago

I began to test DiT. I think that wouldn't cause doubt.

quandao10 commented 3 months ago

Yes, the model seems unstable after 500 epoch. In our paper, we use Cosine Learning rate decay and it depends on the total epoch. To be more stable, we suggest to use ema model and you could use ema code from DiT repo. Ema model is more stable and have better FID. Please, consider to use dropout if model converge to fast, you could have a look at https://arxiv.org/pdf/2102.09672 appendix section about overfitting on CIFAR10

xiaweijiexox commented 3 months ago

OK, thanks. I'm trying again with ema.

xiaweijiexox commented 3 months ago

When I use your EMA.py, I find that “AttributeError: 'EMA' object has no attribute '_optimizer_state_dict_pre_hooks'. Did you mean: 'register_state_dict_pre_hook'?” What do you mean by "ema code from DiT repo" ?

xiaweijiexox commented 3 months ago

Should I use the file DiT/train.py to revise your code? I've made the revision, but I'm not sure about it. Why you have an EMA.py, but I still need to use the ema in DiT?

quandao10 commented 3 months ago

Yes, you should use DiT/train.py to revise my code. I found it is more easier and compact when following DiT repo.

sumorday commented 3 months ago

So, by running the code bash_scripts/run.sh test_args/celeb256_dit.txt, it can automatically perform the so-called flowing matching in the latent space, right? Of course, directly downloading the 475.pth file provided on GitHub and generating 50,000 images, the tested FID value is indeed 5.24. I didn’t use the 475.pth file provided on LFM GitHub, but trained from scratch. The test result of 475.pth did not achieve a FID value of 5.2. just only 6.02.. all images are jpg images I am wondering if it is running correctly.

quandao10 commented 3 months ago

Yes, I think you run it correctly, I wonder what environment you use to run model. I found that the architecture is more stable with torch 1.x version. I retrained our model on torch 2.x, the result is around 5.8 to 6.1, same to you.

sumorday commented 2 months ago

Yes, I think you run it correctly, I wonder what environment you use to run model. I found that the architecture is more stable with torch 1.x version. I retrained our model on torch 2.x, the result is around 5.8 to 6.1, same to you.

Thank you. The issue with the torch.distributed.checkpoint module is that it does not exist in PyTorch 1.x versions. Therefore, if I downgrade to a 1.x version, the code will not work.