bbrattoli / ZeroShotVideoClassification

Zero-shot video classification by end-to-end training of 3D convolutional neural networks
Apache License 2.0
145 stars 24 forks source link

KeyError: "word '---e1gyo84' not in vocabulary" #1

Open ghost opened 4 years ago

ghost commented 4 years ago

Sorry to bother you, but I have encountered the following problem: python3 main.py --n_epochs 150 --bs 22 --lr 1e-3 --network c3d --dataset kinetics2both --save_path /home/m/Desktop/ZeroShotVideoClassification-master/result --nopretrainedTotal batch size: 22 UCF101: total number of videos 13320, classes 101 HMDB51: total number of videos 6766, classes 51 Traceback (most recent call last): File "main.py", line 66, in dataloaders = dataset.get_datasets(opt) File "/home/m/Desktop/ZeroShotVideoClassification-master/dataset.py", line 14, in get_datasets get_datasets = get_both_datasets(opt) File "/home/m/Desktop/ZeroShotVideoClassification-master/dataset.py", line 109, in get_both_datasets train_class_embedding = classes2embedding('kinetics', train_classes, wv_model) File "/home/m/Desktop/ZeroShotVideoClassification-master/auxiliary/auxiliary_word2vec.py", line 20, in classes2embedding embedding = [one_class2embed(class_name, wv_model)[0] for class_name in class_name_inputs] File "/home/m/Desktop/ZeroShotVideoClassification-master/auxiliary/auxiliary_word2vec.py", line 20, in embedding = [one_class2embed(class_name, wv_model)[0] for class_name in class_name_inputs] File "/home/m/Desktop/ZeroShotVideoClassification-master/auxiliary/auxiliary_word2vec.py", line 119, in one_class2embed_kinetics return wv_model[name_vec].mean(0), name_vec File "/home/m/Anaconda/envs/pytorch/lib/python3.6/site-packages/gensim/models/keyedvectors.py", line 355, in getitem return vstack([self.get_vector(entity) for entity in entities]) File "/home/m/Anaconda/envs/pytorch/lib/python3.6/site-packages/gensim/models/keyedvectors.py", line 355, in return vstack([self.get_vector(entity) for entity in entities]) File "/home/m/Anaconda/envs/pytorch/lib/python3.6/site-packages/gensim/models/keyedvectors.py", line 471, in get_vector return self.word_vec(word) File "/home/m/Anaconda/envs/pytorch/lib/python3.6/site-packages/gensim/models/keyedvectors.py", line 468, in word_vec raise KeyError("word '%s' not in vocabulary" % word) KeyError: "word '---e1gyo84' not in vocabulary"

I look forward to your reply. Thank you very much

EmreOzkose commented 2 years ago

It might be a little bit late for you @ML201809, but if there is anyone who encountered this, this answer might helpful.

You should change here like this:

def get_kinetics(dataset=''):
    sourcepath = '/path/to/kinetics400'
    n_classes = '700' if '700' in dataset else '400'
    with open('/path/to/kinetics400/annotations/train.csv', 'r') as f:
        data = [r[:-1].split(',') for r in f.readlines()]

    fnames, labels = [], []
    for x in data[1:]:
        if len(x) < 2: continue
        fnames.append(os.path.join(sourcepath, x[4], x[0], x[1] + ".mp4"))
        labels.append(x[0])

    classes = sorted(np.unique(labels).tolist())
    return fnames, labels, classes

where train.csv is

label,youtube_id,time_start,time_end,split
testifying,---QUuC4vJs,84,94,train
eating spaghetti,--3ouPhoy2A,20,30,train
dribbling basketball,--4-0ihtnBU,58,68,train
playing tennis,--56QUhyDQM,185,195,train
tap dancing,--6q_33gNew,132,142,train
climbing a rope,--EaS9P7ZdQ,13,23,train
brushing teeth,--IPbe5ZMCI,2,12,train
balloon blowing,--Ntf6n-j9Q,17,27,train
feeding birds,--PyMoD3_eg,20,30,train
...