YangLing0818 / SGDiff

Official implementation for "Diffusion-Based Scene Graph to Image Generation with Masked Contrastive Pre-Training" https://arxiv.org/abs/2211.11138
51 stars 6 forks source link

About Evaluation #10

Open Maelic opened 7 months ago

Maelic commented 7 months ago

Dear authors,

After training the model for 335 epochs with image size 256,256 and performing evaluation using the testset_ddim_sampler.py script, I managed to obtain an FID of 23.86 on the test set of Visual Genome. This is better than the 26 reported in the paper, could you please provide details on how you evaluate your work? I used the pytorch-fid project for evaluation.

image

Qi-Chuan commented 3 months ago

Dear authors,

After training the model for 335 epochs with image size 256,256 and performing evaluation using the testset_ddim_sampler.py script, I managed to obtain an FID of 23.86 on the test set of Visual Genome. This is better than the 26 reported in the paper, could you please provide details on how you evaluate your work? I used the pytorch-fid project for evaluation.

image

Hello, can you tell me how long do you take to train the model until the 335 epochs?

Maelic commented 3 months ago

Dear authors, After training the model for 335 epochs with image size 256,256 and performing evaluation using the testset_ddim_sampler.py script, I managed to obtain an FID of 23.86 on the test set of Visual Genome. This is better than the 26 reported in the paper, could you please provide details on how you evaluate your work? I used the pytorch-fid project for evaluation. image

Hello, can you tell me how long do you take to train the model until the 335 epochs?

It depends on your hardware, I used 2xV100 GPUs and it took roughly 8 days for me, see https://github.com/YangLing0818/SGDiff/issues/7#issuecomment-1827581994

Qi-Chuan commented 3 months ago

Dear authors, After training the model for 335 epochs with image size 256,256 and performing evaluation using the testset_ddim_sampler.py script, I managed to obtain an FID of 23.86 on the test set of Visual Genome. This is better than the 26 reported in the paper, could you please provide details on how you evaluate your work? I used the pytorch-fid project for evaluation. image

Hello, can you tell me how long do you take to train the model until the 335 epochs?

It depends on your hardware, I used 2xV100 GPUs and it took roughly 8 days for me, see #7 (comment)

Thanks a lot. When I run "testset_ddim_sampler.py", I find the png files of the generated drawn scene graph in "test_results/scene_graph/" cannot be opened (Maybe file corrupted or the scene graph has not been drawn and saved successfully). Do you have the same problem?

test
Maelic commented 3 months ago

Dear authors, After training the model for 335 epochs with image size 256,256 and performing evaluation using the testset_ddim_sampler.py script, I managed to obtain an FID of 23.86 on the test set of Visual Genome. This is better than the 26 reported in the paper, could you please provide details on how you evaluate your work? I used the pytorch-fid project for evaluation. image

Hello, can you tell me how long do you take to train the model until the 335 epochs?

It depends on your hardware, I used 2xV100 GPUs and it took roughly 8 days for me, see #7 (comment)

Thanks a lot. When I run "testset_ddim_sampler.py", I find the png files of the generated drawn scene graph in "test_results/scene_graph/" cannot be opened (Maybe file corrupted or the scene graph has not been drawn and saved successfully). Do you have the same problem? test

No, I don't have the same problem. You may have an issue with your install of graphviz, the graphs are generated using the dot command which can be installed from the graphviz package (sudo apt-get install graphviz on ubuntu), see:https://github.com/YangLing0818/SGDiff/blob/48673d79d2bd0c84671fb02e162d7a11854474ce/testset_ddim_sampler.py#L165.

Maelic commented 3 months ago

Dear authors, After training the model for 335 epochs with image size 256,256 and performing evaluation using the testset_ddim_sampler.py script, I managed to obtain an FID of 23.86 on the test set of Visual Genome. This is better than the 26 reported in the paper, could you please provide details on how you evaluate your work? I used the pytorch-fid project for evaluation. image

Hello, can you tell me how long do you take to train the model until the 335 epochs?

It depends on your hardware, I used 2xV100 GPUs and it took roughly 8 days for me, see #7 (comment)

By the way, in case you don't want to train for that long, I made a download link for my checkpoint here: sgdiff_epoch_335.ckpt

Qi-Chuan commented 3 months ago

Dear authors, After training the model for 335 epochs with image size 256,256 and performing evaluation using the testset_ddim_sampler.py script, I managed to obtain an FID of 23.86 on the test set of Visual Genome. This is better than the 26 reported in the paper, could you please provide details on how you evaluate your work? I used the pytorch-fid project for evaluation. image

Hello, can you tell me how long do you take to train the model until the 335 epochs?

It depends on your hardware, I used 2xV100 GPUs and it took roughly 8 days for me, see #7 (comment)

By the way, in case you don't want to train for that long, I made a download link for my checkpoint here: sgdiff_epoch_335.ckpt

I will try as your instruction then. Sincerely appreciate your answers and help!

Maelic commented 3 months ago

Dear authors, After training the model for 335 epochs with image size 256,256 and performing evaluation using the testset_ddim_sampler.py script, I managed to obtain an FID of 23.86 on the test set of Visual Genome. This is better than the 26 reported in the paper, could you please provide details on how you evaluate your work? I used the pytorch-fid project for evaluation. image

Hello, can you tell me how long do you take to train the model until the 335 epochs?

It depends on your hardware, I used 2xV100 GPUs and it took roughly 8 days for me, see #7 (comment)

By the way, in case you don't want to train for that long, I made a download link for my checkpoint here: sgdiff_epoch_335.ckpt

I will try as your instruction then. Sincerely appreciate your answers and help!

No worries, good luck with your implementation!

Qi-Chuan commented 3 months ago

Dear authors, After training the model for 335 epochs with image size 256,256 and performing evaluation using the testset_ddim_sampler.py script, I managed to obtain an FID of 23.86 on the test set of Visual Genome. This is better than the 26 reported in the paper, could you please provide details on how you evaluate your work? I used the pytorch-fid project for evaluation. image

Hello, can you tell me how long do you take to train the model until the 335 epochs?

It depends on your hardware, I used 2xV100 GPUs and it took roughly 8 days for me, see #7 (comment)

By the way, in case you don't want to train for that long, I made a download link for my checkpoint here: sgdiff_epoch_335.ckpt

I will try as your instruction then. Sincerely appreciate your answers and help!

No worries, good luck with your implementation!

Hi, after I train the model, how can I generate the same FID and Inception Score results as the paper? Which projects do you use to evaluate the FID and Inception Score metrics? By the way, when reconstructing images where there are some persons, I find that the model does a poor job of reconstructing the details of the person, such as the body skeleton or the face, do you have a similar problem? Also, may I ask your email address for further chat and discussion? Thanks a lot!

Maelic commented 2 months ago

Dear authors, After training the model for 335 epochs with image size 256,256 and performing evaluation using the testset_ddim_sampler.py script, I managed to obtain an FID of 23.86 on the test set of Visual Genome. This is better than the 26 reported in the paper, could you please provide details on how you evaluate your work? I used the pytorch-fid project for evaluation. image

Hello, can you tell me how long do you take to train the model until the 335 epochs?

It depends on your hardware, I used 2xV100 GPUs and it took roughly 8 days for me, see #7 (comment)

By the way, in case you don't want to train for that long, I made a download link for my checkpoint here: sgdiff_epoch_335.ckpt

I will try as your instruction then. Sincerely appreciate your answers and help!

No worries, good luck with your implementation!

Hi, after I train the model, how can I generate the same FID and Inception Score results as the paper? Which projects do you use to evaluate the FID and Inception Score metrics? By the way, when reconstructing images where there are some persons, I find that the model does a poor job of reconstructing the details of the person, such as the body skeleton or the face, do you have a similar problem? Also, may I ask your email address for further chat and discussion? Thanks a lot!

Hi, to compute the FID I used the codebase linked in the first post of this issue. For the Inception Score, I used this one.

Regarding details of the person, as I said in another thread, the model is mostly good at generating the topology of the scene, hindering the performance to capture rich fine-grained details. This may come from the VG dataset which has a poor diversity of annotations or from the pre-training strategy which also focuses a lot on the layout.

My email address is neau0001@flinders.edu.au, you can contact me but I am not one of the authors of this work so my understanding of the codebase is limited.

Qi-Chuan commented 2 months ago

Dear authors, After training the model for 335 epochs with image size 256,256 and performing evaluation using the testset_ddim_sampler.py script, I managed to obtain an FID of 23.86 on the test set of Visual Genome. This is better than the 26 reported in the paper, could you please provide details on how you evaluate your work? I used the pytorch-fid project for evaluation.

image

Hello, can you tell me how long do you take to train the model until the 335 epochs?

It depends on your hardware, I used 2xV100 GPUs and it took roughly 8 days for me, see #7 (comment)

By the way, in case you don't want to train for that long, I made a download link for my checkpoint here: sgdiff_epoch_335.ckpt

I will try as your instruction then. Sincerely appreciate your answers and help!

No worries, good luck with your implementation!

Hi, after I train the model, how can I generate the same FID and Inception Score results as the paper? Which projects do you use to evaluate the FID and Inception Score metrics? By the way, when reconstructing images where there are some persons, I find that the model does a poor job of reconstructing the details of the person, such as the body skeleton or the face, do you have a similar problem? Also, may I ask your email address for further chat and discussion? Thanks a lot!

Hi, to compute the FID I used the codebase linked in the first post of this issue. For the Inception Score, I used this one.

Regarding details of the person, as I said in another thread, the model is mostly good at generating the topology of the scene, hindering the performance to capture rich fine-grained details. This may come from the VG dataset which has a poor diversity of annotations or from the pre-training strategy which also focuses a lot on the layout.

My email address is neau0001@flinders.edu.au, you can contact me but I am not one of the authors of this work so my understanding of the codebase is limited.

Thanks for your reply. My email address is 920892753@qq.com. Look forward to more discussion and interaction with you!