Open KomputerMaster64 opened 2 years ago
I am trying to implement the DDGAN model on the FFHQ 256x256 dataset. I have used the FFHQ 256x256 resized dataset from the kaggle since the FFHQ 1024x1024 dataset has a size of 90 GB, which exceeds the limits of my resources.
The Kaggle dataset has the files in archive.zip file, which has a directory "resized" which contains the 70k .jpg files.
The file structure is as follows:
archive.zip
├ resized
├ (70k images)
I am using google drive and colab notebooks for the implementation. I am using the file setup with CODE_DIR = "/content/drive/MyDrive/Repositories/NVAE" and DATA_DIR = "/content/drive/MyDrive/Repositories/NVAE/dataset_nvae"
. When I try to run the command !python create_ffhq_lmdb.py --ffhq_img_path=$DATA_DIR/ffhq/resized/ --ffhq_lmdb_path=$DATA_DIR/ffhq/ffhq-lmdb --split=train
, I get the following error message:
Traceback (most recent call last):
File "create_ffhq_lmdb.py", line 70, in <module>
main(args.split, args.ffhq_img_path, args.ffhq_lmdb_path)
File "create_ffhq_lmdb.py", line 46, in main
im = Image.open(img_path)
File "/usr/local/lib/python3.7/dist-packages/PIL/Image.py", line 2843, in open
fp = builtins.open(filename, "rb")
FileNotFoundError: [Errno 2] No such file or directory: '/content/drive/MyDrive/Repositories/NVAE/dataset_nvae/ffhq/resized/55962.png'
I altered the line 45 from img_path = os.path.join(ffhq_img_path, '%05d.png' % i)
to img_path = os.path.join(ffhq_img_path, '%05d.jpg' % i)
since the kaggle ffhq 256x256 resized dataset has .jpg
image files.
The above change has resulted in the command !python create_ffhq_lmdb.py --ffhq_img_path=$DATA_DIR/ffhq/resized/ --ffhq_lmdb_path=$DATA_DIR/ffhq/ffhq-lmdb --split=train
giving the following output
100
200
300
400
500
600
700
800
900
1000
1100
1200
.
.
.
.
.
.
I cross checked with the files that were unzipped. The number of files should be 70k but after repeated unzipping operations I am able to extract only 50k or 52k images even though the output of the cell shows the last file unzipped was 69999.jpg
Google Colab Notebook and Google Drive used for the implementation.
Command used: !unzip images1024x1024.zip -d $DATA_DIR/ffhq/
Last few lines of h the output of the cell:
inflating: /content/drive/MyDrive/Repositories/NVAE/dataset_nvae/ffhq/resized/69990.jpg
inflating: /content/drive/MyDrive/Repositories/NVAE/dataset_nvae/ffhq/resized/69991.jpg
inflating: /content/drive/MyDrive/Repositories/NVAE/dataset_nvae/ffhq/resized/69992.jpg
inflating: /content/drive/MyDrive/Repositories/NVAE/dataset_nvae/ffhq/resized/69993.jpg
inflating: /content/drive/MyDrive/Repositories/NVAE/dataset_nvae/ffhq/resized/69994.jpg
inflating: /content/drive/MyDrive/Repositories/NVAE/dataset_nvae/ffhq/resized/69995.jpg
inflating: /content/drive/MyDrive/Repositories/NVAE/dataset_nvae/ffhq/resized/69996.jpg
inflating: /content/drive/MyDrive/Repositories/NVAE/dataset_nvae/ffhq/resized/69997.jpg
inflating: /content/drive/MyDrive/Repositories/NVAE/dataset_nvae/ffhq/resized/69998.jpg
inflating: /content/drive/MyDrive/Repositories/NVAE/dataset_nvae/ffhq/resized/69999.jpg
Output of the Google Drive after the operation.
I altered the line 45 from
img_path = os.path.join(ffhq_img_path, '%05d.png' % i)
toimg_path = os.path.join(ffhq_img_path, '%05d.jpg' % i)
since the kaggle ffhq 256x256 resized dataset has.jpg
image files. The above change has resulted in the command!python create_ffhq_lmdb.py --ffhq_img_path=$DATA_DIR/ffhq/resized/ --ffhq_lmdb_path=$DATA_DIR/ffhq/ffhq-lmdb --split=train
giving the following output100 200 300 400 500 600 700 800 900 1000 1100 1200 . . . . . .
After executing the command
!python create_ffhq_lmdb.py --ffhq_img_path=$DATA_DIR/ffhq/resized/ --ffhq_lmdb_path=$DATA_DIR/ffhq/ffhq-lmdb --split=train
I am getting the following output showing that the training set has been converted into the LMDB dataset:48600 48700 48800 48900 49000 ... 62800 62900 63000 added 63000 items to the LMDB datset.
HOWEVER, right after 2 minutes, the above suggested output changes to the following output:48600 48700 48800 48900 49000 49100 ... main(args.split, args.ffhq_img_path, args.ffhq_lmdb_path) File "create_ffhq_lmdb.py", line 55, in main print('added %d items to the LMDB dataset.' % count) lmdb.Error: mdb_txn_commit: Disk quota exceeded
This behaviour is not observed for the validation set. I request you to please guide me.
Respected sir Thank you for sharing the implementation and weights for the DDGAN model. I am comparing the DDGAN model with other generative models for image generation.I wanted to train the model on FFHQ 256x256 dataset. For getting to the 256x256 version of the dataset, one has to download the 1024x1024 version of it (the dataset preparation method is given in the NVIDIA NVAE repository). However I am facing an issue, the dataset (FFHQ 1024x1024) is almost 90 GB in size, which exceeds the limits of my current resources.
I thought of downloading the resized FFHQ 256x256 version from kaggle, however I am not sure the pre-processing scripts will work fine. I humbly request you to guide me.
PS I would be grateful if you could share the pre-trained DDGAN model on the FFHQ 256x256 dataset.