albanie / collaborative-experts

Video embeddings for retrieval with natural language queries
https://www.robots.ox.ac.uk/~vgg/research/collaborative-experts/
Apache License 2.0
336 stars 55 forks source link

Text features used in collaborative experts #28

Closed lingjunzhao closed 3 years ago

lingjunzhao commented 3 years ago

Hi, I saw in the paper says the text features are: pretrained word2vec word embeddings and then passed through a pretrained OpenAI-GPT model. I usually see people using word2vec or GPT alone, instead of combining together -- wondering could you please explain a bit how you obtain the final text feature?

I downloaded checked the text features for MSRVTT dataset as well, they're 768 dimensions same as GPT. Wondering if your final model use only GPT embeddings, or combining with word2vec somehow? It would be really nice if you could point to the codes to generate the aggregated text features that the model would use directly.

Tortoise17 commented 3 years ago

@ioanacroi can you explain how misc/prepare_text_embeddings.py can work normally? I tried to run this and for already available dataset or for any customized text for my dataset. There is error. 'Module model not found'. if you can guide better. Please

albanie commented 3 years ago

@cathvoilet, thanks very much for flagging this issue - that's a mistake in the paper writing, which will need to be corrected!

The inputs to GPT do not depend on word2vec (they are processed independently). The text embeddings are generated with misc/prepare_text_embeddings.py

lingjunzhao commented 3 years ago

Thanks @albanie !

ioanacroi commented 3 years ago

@Tortoise17 the error you are having comes from not having the PYTHONPATH set accordingly. You can set the path like this: export PYTHONPATH=$PYTHONPATH:/path/to/working/dir However there are some problems in the current branch with the script misc/prepare_text_embeddings.py. I will check and update it.

ioanacroi commented 3 years ago

@Tortoise17 we have updated the code. Now the prepare_text_embeddings script should work. You can use it like this: python misc/prepare_text_embeddings.py --dataset MSRVTT --embedding_name w2v

I will close this issue. Feel free to open it if necessary! Cheers

Tortoise17 commented 3 years ago

infinite thank you @ioanacroi

Tortoise17 commented 3 years ago

@ioanacroi Can you guide what this error means?

File "misc/prepare_text_embeddings.py", line 149, in extract_embeddings
    model = prepare_embedding_model(embedding_name, text_embedding_config)
  File "/home/user/anaconda3/envs/emb/lib/python3.7/site-packages/typeguard/__init__.py", line 927, in wrapper
    retval = func(*args, **kwargs)
  File "misc/prepare_text_embeddings.py", line 62, in prepare_embedding_model
    return cls_map[embedding_name](embedding_name=embedding_name, **conf)
  File "/home/user/anaconda3/envs/emb/lib/python3.7/site-packages/typeguard/__init__.py", line 925, in wrapper
    memo = _CallMemo(python_func, _localns, args=args, kwargs=kwargs)
  File "/home/user/anaconda3/envs/emb/lib/python3.7/site-packages/typeguard/__init__.py", line 128, in __init__
    self.arguments = signature.bind(*args, **kwargs).arguments
  File "/home/user/anaconda3/envs/emb/lib/python3.7/inspect.py", line 3015, in bind
    return args[0]._bind(args[1:], kwargs)
  File "/home/user/anaconda3/envs/emb/lib/python3.7/inspect.py", line 2986, in _bind
    format(arg=param_name)) from None
TypeError: missing a required argument: 'remove_stopwords'