Closed lingjunzhao closed 3 years ago
@ioanacroi can you explain how misc/prepare_text_embeddings.py
can work normally? I tried to run this and for already available dataset or for any customized text for my dataset. There is error. 'Module model not found'. if you can guide better. Please
@cathvoilet, thanks very much for flagging this issue - that's a mistake in the paper writing, which will need to be corrected!
The inputs to GPT do not depend on word2vec (they are processed independently). The text embeddings are generated with misc/prepare_text_embeddings.py
Thanks @albanie !
@Tortoise17 the error you are having comes from not having the PYTHONPATH
set accordingly. You can set the path like this:
export PYTHONPATH=$PYTHONPATH:/path/to/working/dir
However there are some problems in the current branch with the script misc/prepare_text_embeddings.py. I will check and update it.
@Tortoise17 we have updated the code. Now the prepare_text_embeddings script should work. You can use it like this:
python misc/prepare_text_embeddings.py --dataset MSRVTT --embedding_name w2v
I will close this issue. Feel free to open it if necessary! Cheers
infinite thank you @ioanacroi
@ioanacroi Can you guide what this error means?
File "misc/prepare_text_embeddings.py", line 149, in extract_embeddings
model = prepare_embedding_model(embedding_name, text_embedding_config)
File "/home/user/anaconda3/envs/emb/lib/python3.7/site-packages/typeguard/__init__.py", line 927, in wrapper
retval = func(*args, **kwargs)
File "misc/prepare_text_embeddings.py", line 62, in prepare_embedding_model
return cls_map[embedding_name](embedding_name=embedding_name, **conf)
File "/home/user/anaconda3/envs/emb/lib/python3.7/site-packages/typeguard/__init__.py", line 925, in wrapper
memo = _CallMemo(python_func, _localns, args=args, kwargs=kwargs)
File "/home/user/anaconda3/envs/emb/lib/python3.7/site-packages/typeguard/__init__.py", line 128, in __init__
self.arguments = signature.bind(*args, **kwargs).arguments
File "/home/user/anaconda3/envs/emb/lib/python3.7/inspect.py", line 3015, in bind
return args[0]._bind(args[1:], kwargs)
File "/home/user/anaconda3/envs/emb/lib/python3.7/inspect.py", line 2986, in _bind
format(arg=param_name)) from None
TypeError: missing a required argument: 'remove_stopwords'
Hi, I saw in the paper says the text features are: pretrained word2vec word embeddings and then passed through a pretrained OpenAI-GPT model. I usually see people using word2vec or GPT alone, instead of combining together -- wondering could you please explain a bit how you obtain the final text feature?
I downloaded checked the text features for MSRVTT dataset as well, they're 768 dimensions same as GPT. Wondering if your final model use only GPT embeddings, or combining with word2vec somehow? It would be really nice if you could point to the codes to generate the aggregated text features that the model would use directly.