Closed shubhamagarwal92 closed 4 years ago
Another one (404 not found):
wget https://multimedia-commons.s3-us-west-2.amazonaws.com/data/images/ac8/f9f/ac8f9ff369308ac4d3643d3114c6718b.jpg -P data/yfcc_images/
@klshuster Otherwise, could you please let me know where in the code can I ignore these hashes to reproduce the baseline results. I have been trying to use the released pre-trained model and run it in the eval phase as:
python examples/eval_model.py -mf zoo:image_chat/transresnet_multimodal/model -t image_chat --yfcc_path data/yfcc_images/ -dt valid
Hi @shubhamagarwal92, my apologies for the delayed response.
resnext1010_32x48d_wsl
model, however it is not specifically this one.Regarding your question about ignoring the bad hashes - one way to fix this would be to compile the list of hashes, and then in the _setup_data
function for the teacher (here) just iterate through and remove examples with image hashes for which you do not have the image.
Hope that answers your questions.
Hey! @shubhamagarwal92 More info on the faster RCNN features: For the faster RCNN features, we used the pythia script and used their Visual Genome pretrained ResNext backbone faster RCNN. You can download that model from Pythia GitHub page, (please click on their v0.4 branch for more information if you need). The difference in performances between ResNext / ResNet backbone faster RCNN are negligible.
@klshuster Thank you for your detailed response and all the pointers!
I am guessing even with resnext1010_32x48d_wsl
I should be able to reproduce almost the same baseline results? Anyways, I can try with different models here.
For the teacher function in agents, would this trick be able to support examples/interactive.py
or eval_model.py
or train_model.py
or display_model.py
? Currently, everything was breaking because of missing hashes. I am not sure which of these APIs call the teacher. Is there any other place where data is being loaded? Thank you again for pointing me to this code. :)
@dexterju Many thanks! :)
You should at the very least get results that are no worse than the ResNet152 results listed here.
That trick should solve most of those issues - if you find you're running into another issue please let me know.
@klshuster I tried to follow your trick and tried to ignore some hashes in agents.
python examples/eval_model.py -mf zoo:image_chat/transresnet_multimodal/model -t image_chat --yfcc_path data/yfcc_images/ -dt valid
However, I think it is not getting called.
print("Ignoring hash list now")
ignore_hash_list = self.get_ignore_hash_list(data_path, ignore_hash_list_filename)
self.data = self.ignore_hash_json(self.data, ignore_hash_list)
Do you have any suggestions? Could you please verify the args for examples/eval_model.py
. Am I missing anything?
UPDATE:
Sorry, please ignore this comment. I had two versions of ParlAI repository and the parlai package was installed through a different repo than my current working repo via python setup.py develop
in my conda environment. So, this was calling the script from the other repository.
Could you please paste the exact error you are getting? and also perhaps more context for where you put the code you put above?
Please ignore the above message. I am able to ignore the hashes in the agent as you suggested and successfully run the code in eval mode as:
python examples/eval_model.py -mf zoo:image_chat/transresnet_multimodal/model -t image_chat --yfcc_path data/yfcc_images/ -dt test
This was able to successfully download the pre-trained model:
[ downloading: http://parl.ai/downloads/_models/image_chat/transresnet_multimodal.tgz to /scratch/shubham/projects/image_chat/pvt/data/models/image_chat/transresnet_multimodal/transresnet_multimodal.tgz ]
Downloading transresnet_multimodal.tgz: 100%|██████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 1.27G/1.27G [01:33<00:00, 13.5MB/s]
unpacking transresnet_multimodal.tgz
Could you please let me know how to interpret these results, compared to Table 2 in the arxiv paper:
[ Finished evaluating tasks ['image_chat'] using datatype test ]
{'exs': 29982, 'accuracy': 0.015342538856647322, 'f1': 0.07288192608310763, 'bleu-4': 0.014628667324294827, 'hits@1': 0.015342538856647322, 'hits@5': 0.06610633046494563, 'hits@10': 0.12224001067307051, 'hits@100': 1.0, 'first_round': {'hits@1/100': 0.01351, 'loss': -1.0, 'med_rank': 48.0}, 'second_round': {'hits@1/100': 0.01571, 'loss': -1.0, 'med_rank': 46.0}, 'third_round+': {'hits@1/100': 0.01681, 'loss': -1.0, 'med_rank': 47.0}}
Thanks again. Sorry for troubling you with all the questions.
Hope that helps answer your questions.
Thanks for asking the questions @shubhamagarwal92, it's great for the community to have these sort of clarifications.
Closing since everything looks finished here, but don't hesitate with any follow ups.
Hi @stephenroller @klshuster
Could you please let me know the hyperparameters used for the baseline models.
I used the following command to train the model:
python parlai/scripts/train_model.py \
-m projects:image_chat:transresnet_multimodal \
-t image_chat \
--yfcc_path ${YFCC_DIR} \
-bs 512 \
-mf ${MODEL_SAVE_DIR} > ${MODEL_SAVE_DIR}/logs.txt
Please see the logs here: logs.txt
The default model seems to use 2 layers and 2 heads as specified in the attached logs.
I also tried to reproduce the results with the command:
python examples/eval_model.py -mf zoo:image_chat/transresnet_multimodal/model -t image_chat --yfcc_path data/yfcc_images/ -dt test
Is this the reason that the results reported in my previous comment were
turn-1 R@1 = 1.3, Turn2 R@1=1.57 and Turn3 R@1=1.6
Thanks.
Hi @shubhamagarwal92
The hyperparameters for the reference model can be found in data/models/image_chat/transresnet_multimodal/model.opt
, however I will paste the relevant ones (i.e. ones that may differ from their default values) below to save you the trouble of parsing that json dict:
--n-layers 4 \
--embedding-size 300 \
--ffn-size 1200 \
--relu-dropout 0.2 \
--n-heads 6 \
--n-positions 1000 \
--variant aiayn \
--activation relu \
--truncate 64 \
--hidden-dim 500 \
--num-layers-all 2 \
--learningrate 0.005 \
--additional-layer-dropout 0.2 \
--validation-patience 10 \
--validation-every-n-epochs 1 \
--image-mode resnet152
Additionally, due to a slight bug in the parameter setup for ParlAI, you'll need to specify --image-mode resnet152
when evaluating the pre-trained model; that should yield appropriate results (without that, the model sees no images)
Hi @klshuster
Thank you for clarifying and sharing the hyperparams!
Indeed after the image_model
flag, I am able to reproduce the baseline results. :)
I want to confirm some of the other doubts:
-mf zoo:image_chat/transresnet_multimodal/model
vs -mf models:image_chat/transresnet_multimodal/model
? train_model.py
, am I using the correct flag -m projects:image_chat:transresnet_multimodal
? ( I couldnt find and image chat agent in parlai.agents
folder.)num_epochs
is set to -1 in the model.opt
. What is the stopping criteria that is used? Do you have an estimate as to how many epochs the models were trained on?Some ParlAI related questions:
gpu_ids
while training (eg. gpu 2,3)? I couldn't find any gpu arg in model.opt
. If I want to add this argument, what should be the right place? Like here? projects.image_chat.my_model
directory?
where I can have something like:
# same imports as /projects/image_chat/transresnet_multimodal/transresnet_multimodal.py
class MyAgent(TransresnetMultimodalAgent):
and called as `-m projects:image_chat:my_model`
zoo
and models
return the exact same thing (just depends on preference 😄)--validation-patience 10 --validation-every-n-epochs 1
, i.e., if validation accuracy does not improve for 10 epochs, we stop traininggpu
arg to specify devices; you can add that to the agent where you specified if you likeprojects:new_project:my_model
or in the parlai agents directory (where you can then just specify -m my_model
. Thanks a lot for all the help! :)
Hi @klshuster,
Hope you are holding up well! Sorry to bother you again.
Even though the pre-trained model can reproduce the results in the eval mode, the training command still cannot replicate the results. PFA both eval_logs.txt and train_logs.txt
I tried with image_mode
flag as well as all the hyperparams suggested. Also, PFA the command:
export MODEL_SAVE_DIR=${MODEL_DIR}/reproduce/
python parlai/scripts/train_model.py \
-m projects:image_chat:transresnet_multimodal \
-t image_chat \
--yfcc_path ${YFCC_DIR} \
-bs 256 \
--image-mode resnet152 \
--n-layers 4 \
--embedding-size 300 \
--ffn-size 1200 \
--relu-dropout 0.2 \
--n-heads 6 \
--n-positions 1000 \
--variant aiayn \
--activation relu \
--truncate 64 \
--hidden-dim 500 \
--num-layers-all 2 \
--learningrate 0.005 \
--additional-layer-dropout 0.2 \
--validation-patience 10 \
--validation-every-n-epochs 1 \
-mf ${MODEL_SAVE_DIR}/basic_model > ${MODEL_SAVE_DIR}/train_logs.txt
Could you please suggest what I am missing?
PS. A suggestion for the ParlAI general documentation about the naming convention:
a. If we want to create own agent in agents directory, we should have the class name exactly as MyModelNameAgent
in the agents.my_model_name.my_model_name.py
(with a strict directory structure)
b. However, if we want to create as projects:new_project:my_model_name
we have to follow the directory structure as projects.new_project.my_model_name.my_model_name.py
with the class name exactly as MyModelNameAgent
Loader matches on this exact naming convention here. Even capitalization in the class name such as MYModelNAmeAgent
could mess things up. This should be explicit in the parrot
example.
Thanks again for your help!
RE: training...
One important thing to note is that the context and candidate encoders in the pre-trained model were themselves pre-trained (see section 4.1 in the paper for more discussion about the "Dialogue Encoder"). We do not currently have plans to release these specific pre-trained encoders, though there are a number of other pre-trained Transformer encoders you can find in the ParlAI Model zoo.
Your notes on naming convention are accurate, and the docs should probably reflect these. Note that you can always specify the agent name by typing -m my_model_name:MyModelNameAgent
or -m projects:new_project:my_model_name:MYModelNAmeAgent
, which will allow you to name your Agent whatever you'd like 😄
@klshuster
Thanks for your reply. The difference in performance is too stark:
After training the model, results on test set:
'accuracy': 0.0105396571276099, 'f1': 0.05519516050648507, 'bleu-4': 0.01001851816223969, 'hits@1': 0.0105396571276099, 'hits@5': 0.05203121873123874, 'hits@10': 0.10069374958308318, 'hits@100': 1.0
For running it only in evaluation mode (on test):
'accuracy': 0.4058435061036622, 'f1': 0.44580235298830806, 'bleu-4': 0.39437274252183546, 'hits@1': 0.4058435061036622, 'hits@5': 0.6724701487559203, 'hits@10': 0.779100793809619, 'hits@100': 1.0
Hits@5 is 5.2 when training from scratch compared to 67.2 when directly using in eval mode. Is there any way to train a model and have results in a decent ballpark - like at least 65 for Hits@5?
Do you think I am still missing any argument to reproduce the results while training? Could you suggest any pre-trained encoder in the zoo and elicit the flags to pass it to the model?
Thanks.
A couple ideas that may help you get better results:
@klshuster Thanks for the suggestions. I am already trying different hyperparams but it seems like a big gap to cover just through hyperparam optimization.
But as you earlier suggested, could you show how to use a pre-trained encoder from the zoo in ParlAI?
Any transformer-based model in the zoo would work with these encoders; you would just need to massage the state dicts and load accordingly (I do not have specific steps).
@klshuster, Thank you for merging my PR. I have some questions related to the
image_chat
project:resnext101_32x48d_wsl
.