-
# Task Name: Text-to-Audio Generation
The task aims to generate general audio based on the given holistic text description.
## Task Objective
The primary goal of the Text-to-Sound (TTA) Gener…
-
Thank you for sharing the implementations of the GAN based models on popular datasets like CelebA.
I have implemented the WGAN-GP model (in PyTorch), the samples are looking closer the reported work …
-
Hi guys, thank you for the amazing work!
I have a trouble with FID. During the training, many times the log shows FID is nan. It does not happen all the time but more than a half of them.
So I w…
-
```
├── real_source
├── aaa.png
├── bbb.jpg
├── real_target
├── ccc.png
├── ddd.jpg
├── fake
├── ccc_fake.png
├── ddd_fake.jpg
├── main.py
├── inception_score.py
…
-
I use the fid.py to measure fid score of my images datasets, I generated 10000 images in tImages directory and used the command 'python fid.py ./tImages fid_stats_celeba.npz' or 'python fid.py ./tImag…
-
-
As per @douwekiela's suggestion, we should find the blind spots that we have in terms of missing metrics, especially from domains like speech recognition and computer vision.
Suggestions are welcom…
-
We only need to compute differences between two distributions real-source and fake images. So what do real-target stands for?
-
Hi,
It's a nice work. But I have questions about experiments on Diffusion.
1. In Table 8, do you compare your results with full data training (shown as original by 7.83)? But in E.2 VISUALIZATI…
-
Hello, during the inference phase, do I only need to use the 886 audio files from your data/test_audiocaps_subset.json? I have been unable to obtain the results from your paper, even when using your c…