Missing file and unable to load pretrained model

e-bug / volta

[TACL 2021] Code and data for the framework in "Multimodal Pretraining Unmasked: A Meta-Analysis and a Unified Framework of Vision-and-Language BERTs"

https://aclanthology.org/2021.tacl-1.58/

MIT License

114 stars 24 forks source link

Missing file and unable to load pretrained model #3

Closed darthgera123 closed 3 years ago

darthgera123 commented 3 years ago

While trying to get the code running, I faced 2 issues: 1) Unable to find datasets/refcoco+_unc/annotations/cache/refcoco+_val_20_36.pkl The link mentioned didn't have the cache repo and currently I'm trying to run it using an empty file 2) When I'm trying to load pytorch_model_9.bin, it is expecting the pretrained models present in the dictionary in volta/encoders.py for eg bert-base-encased, roberta, etc Please help @elliottd @e-bug

e-bug commented 3 years ago

The cache file is generated the first time you run a model. Make sure you update datasets/ with your data directory.
Have you checked the examples yet (e.g. for ViLBERT)?

darthgera123 commented 3 years ago

Thanks for responding. I was running the Vilbert example only. This is the error that I am getting

ERROR - volta.utils -   Model name 'checkpoints/conceptual_captions/ctrl_vilbert/ctrl_vilbert_base/pytorch_model_9.bin' was not found in model name list (bert-base-uncased, bert-large-uncased, bert-base-cased, bert-large-cased, bert-base-multilingual-uncased, bert-base-multilingual-cased, bert-base-chinese, bert-base-german-cased, bert-large-uncased-whole-word-masking, bert-large-cased-whole-word-masking, bert-large-uncased-whole-word-masking-finetuned-squad, bert-large-cased-whole-word-masking-finetuned-squad, bert-base-cased-finetuned-mrpc, roberta-base, roberta-large, roberta-large-mnli). We assumed 'checkpoints/conceptual_captions/ctrl_vilbert/ctrl_vilbert_base/pytorch_model_9.bin' was a path or url but couldn't find any file associated to this path or URL.

Which is why I raised issue 2 Please help @e-bug

e-bug commented 3 years ago

Are you sure you have the checkpoint exactly at checkpoints/conceptual_captions/ctrl_vilbert/ctrl_vilbert_base/pytorch_model_9.bin? Try passing an absolute path and see if that solves your error.

darthgera123 commented 3 years ago

Thanks for responding. The path was wrong and now it loads. The latest error it shows is this :

THCudaCheck FAIL file=/pytorch/aten/src/ATen/native/cuda/Dropout.cu line=147 error=209 : no kernel image is available for execution on the device

Any idea @e-bug

e-bug commented 3 years ago

It might be the GPU device itself: I got that error when running the code on an older GPU. I have no idea how to fix it. If you find a way, share it :)

darthgera123 commented 3 years ago

Is there a way to change batch sizes? Also whats the cuda version on your gpu? Im trying to get this running on 2080Ti with cuda 10.2 I am still getting that no kernel image error

e-bug commented 3 years ago

Of course, please check the configuration files for the tasks (e.g. this)
CUDA 10.1 or 10.2