XuyangGuo / CtrlHair

Controllable Hair Editing (ECCV 2022)
76 stars 8 forks source link

how to correctly unzip the downloaded FFHQ data ? #1

Closed iTomxy closed 2 years ago

iTomxy commented 2 years ago

I download the FFHQ dataset you provided from Google Drive, but it's split into 4 parts:

I tried the methods in [1-5], but failed. The extracted data will raise errors when doing the preprocessing.

As I detected, there are 914 images are seperated in 2 parts by Google Drive when I download: First list the file names in all 4 parts:

# shell
unzip -Z1 ffhq-001.zip > 1.txt
unzip -Z1 ffhq-002.zip > 2.txt
unzip -Z1 ffhq-003.zip > 3.txt
unzip -Z1 ffhq-004.zip > 4.txt

then find out the seperated images:

# python
import collections

cnt = collections.defaultdict(list)
for i in range(1, 5):
    # print(i)
    with open("{}.txt".format(i), "r") as f:
        for line in f:
            line = line.strip()
            # print(line)
            if ".png" == line[-4:]:
                k = int(line.split('/')[-1].split(".png")[0])
                # print(k)
                cnt[k].append(i)
            # break

print("#images:", len(cnt))   # 69080

print("check seperated images")
n_seperated = 0
for k in cnt:
    if len(cnt[k]) > 1:
        n_seperated += 1
        # print(k, "in:", cnt[k])
print("#seperated images:", n_seperated)   # 914

How do you solve this problem ?

Thanks


  1. How to unzip a multipart (spanned) ZIP on Linux?
  2. Combine the split zip files downloading from Google Drive [closed]
  3. How to unzip multiple zip files into a single directory structure (e.g. Google Drive folder export)
  4. How to extract and join files xxx.zip, xxx.z01 and xxx.z02
  5. linux解压分文件(multipart)的 .zip 文件
iTomxy commented 2 years ago

I found that the duplicated images in 2 parts are actually identical to each other. So I guess one can simply run unzip -n '*.zip' for decompression.