how to correctly unzip the downloaded FFHQ data ?

I download the FFHQ dataset you provided from Google Drive, but it's split into 4 parts:

ffhq-001.zip
ffhq-002.zip
ffhq-003.zip
ffhq-004.zip

I tried the methods in [1-5], but failed. The extracted data will raise errors when doing the preprocessing.

As I detected, there are 914 images are seperated in 2 parts by Google Drive when I download: First list the file names in all 4 parts:

# shell
unzip -Z1 ffhq-001.zip > 1.txt
unzip -Z1 ffhq-002.zip > 2.txt
unzip -Z1 ffhq-003.zip > 3.txt
unzip -Z1 ffhq-004.zip > 4.txt

then find out the seperated images:

# python
import collections

cnt = collections.defaultdict(list)
for i in range(1, 5):
    # print(i)
    with open("{}.txt".format(i), "r") as f:
        for line in f:
            line = line.strip()
            # print(line)
            if ".png" == line[-4:]:
                k = int(line.split('/')[-1].split(".png")[0])
                # print(k)
                cnt[k].append(i)
            # break

print("#images:", len(cnt))   # 69080

print("check seperated images")
n_seperated = 0
for k in cnt:
    if len(cnt[k]) > 1:
        n_seperated += 1
        # print(k, "in:", cnt[k])
print("#seperated images:", n_seperated)   # 914

How do you solve this problem ?

Thanks

XuyangGuo / CtrlHair

how to correctly unzip the downloaded FFHQ data ? #1