NUS-HPC-AI-Lab / VideoSys

VideoSys: An easy and efficient system for video generation
Apache License 2.0
1.73k stars 116 forks source link

A question of preprocess `ucf-101` 🤗 #101

Closed AoqunJin closed 7 months ago

AoqunJin commented 7 months ago

I have the dataset of ucf-101 and it seems format mismatch with the preprocess.py.

My ucf-101 has 2 folder: (From https://www.crcv.ucf.edu/data/UCF101.php)

The UCF-101

$ tree -L 1 UCF-101/
UCF-101/
├── ApplyEyeMakeup
├── ApplyLipstick
├── Archery
...
├── WritingOnBoard
└── YoYo

And The ucfTrainTestlist

$ tree -L 1 ucfTrainTestlist/
ucfTrainTestlist/
├── classInd.txt
├── testlist01.txt
├── testlist02.txt
├── testlist03.txt
├── trainlist01.txt
├── trainlist02.txt
└── trainlist03.txt

Even I can process them with a script, but

How to deal with that? 🤗❤

AoqunJin commented 7 months ago

This works.

import csv

def split_by_capital(name):
    # BoxingPunchingBag -> Boxing Punching Bag
    new_name = ""
    for i in range(len(name)):
        if name[i].isupper() and i != 0:
            new_name += " "
        new_name += name[i]
    return new_name

class_d = {}
with open("./ucfTrainTestlist/classInd.txt", "r") as f:
    class_l = f.readlines()    
    for kv in class_l:
        k, v = kv.strip("\n").split(" ")
        class_d[k] = v

data_l = []
with open("./ucfTrainTestlist/trainlist01.txt", "r") as f:
    data_l.extend(f.readlines())
with open("./ucfTrainTestlist/trainlist02.txt", "r") as f:
    data_l.extend(f.readlines())
with open("./ucfTrainTestlist/trainlist03.txt", "r") as f:
    data_l.extend(f.readlines())

for i in range(len(data_l)):
    k, v = data_l[i].strip("\n").split(" ")
    data_l[i] = "./videos/UCF-101/" + k, split_by_capital(class_d[v])

with open("./ucfTrainTestlist/data_index.csv", "w") as f:
    writer = csv.writer(f)
    writer.writerows(data_l)

print("Finish!")