antoyang / FrozenBiLM

[NeurIPS 2022] Zero-Shot Video Question Answering via Frozen Bidirectional Language Models
https://arxiv.org/abs/2206.08155
Apache License 2.0
156 stars 22 forks source link

Problematic Tokennizer? #3

Closed cliangyu closed 2 years ago

cliangyu commented 2 years ago

Hi! I am trying zeroshot inference with the code below

DATA_DIR=data
DATASET=activitynet
DATASET_FILE=ActivityNet-QA
CKPT_PATH=checkpoints/frozenbilm_activitynet.pth

TRANSFORMERS_CACHE=/root/.cache/huggingface/transformers \
CUDA_VISIBLE_DEVICES=4,5,6,7 \
CUDA_LAUNCH_BLOCKING=1 \
python -m torch.distributed.run --nproc_per_node 4 videoqa.py --test --eval \
--combine_datasets $DATASET --combine_datasets_val $DATASET --save_dir=zs${DATASET} \
--ds_factor_ff=8 --ds_factor_attn=8 --suffix="." \
--batch_size_val=32 --max_tokens=256 --load=$CKPT_PATH \
"--${DATASET}_vocab_path"=$DATA_DIR/$DATASET_FILE/vocab1000.json \
"--${DATASET}_train_csv_path"=$DATA_DIR/$DATASET_FILE/train.json "--${DATASET}_test_csv_path"=$DATA_DIR/$DATASET_FILE/test.csv

While I encountered the issue of sentencepiece

*****************************************
Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. 
*****************************************
ERROR:root:No token file found. Also make sure that a [prod] section with a 'token = value' assignment exists.
ERROR:root:No token file found. Also make sure that a [prod] section with a 'token = value' assignment exists.
ERROR:root:No token file found. Also make sure that a [prod] section with a 'token = value' assignment exists.
ERROR:root:No token file found. Also make sure that a [prod] section with a 'token = value' assignment exists.
| distributed init (rank 0): env://
| distributed init (rank 3): env://
| distributed init (rank 1): env://
| distributed init (rank 2): env://
Namespace(combine_datasets=['activitynet'], combine_datasets_val=['activitynet'], webvid_features_path='webvid_clipvitl14_features', webvid_train_csv_path='data/WebVid/train_captions.csv', webvid_val_csv_path='data/WebVid/val_captions.csv', lsmdc_features_path='data/LSMDC/clipvitl14.pth', lsmdc_train_csv_path='data/LSMDC/training.csv', lsmdc_val_csv_path='data/LSMDC/val.csv', lsmdc_test_csv_path='data/LSMDC/test.csv', lsmdc_vocab_path='data/LSMDC/vocab.json', lsmdc_subtitles_path='data/LSMDC/subtitles.pkl', ivqa_features_path='data/iVQA/clipvitl14.pth', ivqa_train_csv_path='data/iVQA/train.csv', ivqa_val_csv_path='data/iVQA/val.csv', ivqa_test_csv_path='data/iVQA/test.csv', ivqa_vocab_path='data/iVQA/vocab.json', ivqa_subtitles_path='data/iVQA/subtitles.pkl', msrvtt_features_path='data/MSRVTT-QA/clipvitl14.pth', msrvtt_train_csv_path='data/MSRVTT-QA/train.csv', msrvtt_val_csv_path='data/MSRVTT-QA/val.csv', msrvtt_test_csv_path='data/MSRVTT-QA/test.csv', msrvtt_vocab_path='data/MSRVTT-QA/vocab.json', msrvtt_subtitles_path='data/MSRVTT-QA/subtitles.pkl', msvd_features_path='data/MSVD-QA/clipvitl14.pth', msvd_train_csv_path='data/MSVD-QA/train.csv', msvd_val_csv_path='data/MSVD-QA/val.csv', msvd_test_csv_path='data/MSVD-QA/test.csv', msvd_vocab_path='data/MSVD-QA/vocab.json', msvd_subtitles_path='data/MSVD-QA/subtitles.pkl', activitynet_features_path='data/ActivityNet-QA/clipvitl14.pth', activitynet_train_csv_path='data/ActivityNet-QA/train.json', activitynet_val_csv_path='data/ActivityNet-QA/val.csv', activitynet_test_csv_path='data/ActivityNet-QA/test.csv', activitynet_vocab_path='data/ActivityNet-QA/vocab1000.json', activitynet_subtitles_path='data/ActivityNet-QA/subtitles.pkl', tgif_features_path='data/TGIF-QA/clipvitl14.pth', tgif_frameqa_train_csv_path='data/TGIF-QA/train_frameqa.csv', tgif_frameqa_test_csv_path='data/TGIF-QA/test_frameqa.csv', tgif_vocab_path='data/TGIF-QA/vocab.json', how2qa_features_path='data/How2QA/clipvitl14_split.pth', how2qa_train_csv_path='data/How2QA/train.csv', how2qa_val_csv_path='data/How2QA/public_val.csv', how2qa_subtitles_path='data/How2QA/subtitles.pkl', tvqa_features_path='data/TVQA/clipvitl14.pth', tvqa_train_csv_path='data/TVQA/train.csv', tvqa_val_csv_path='data/TVQA/val.csv', tvqa_test_csv_path='data/TVQA/test_public.csv', tvqa_subtitles_path='data/TVQA/subtitles.pkl', vqa_features_path='data/VQA/clipvitl14.pth', vqa_train_pkl_path='data/VQA/train_list.pkl', vqa_val_pkl_path='data/VQA/val_list.csv', vqa_vocab_path='data/VQA/vocab.json', mlm_prob=0.15, lr=0.0003, beta1=0.9, beta2=0.95, batch_size=32, batch_size_val=32, weight_decay=0, epochs=10, lr_drop=10, optimizer='adam', clip_max_norm=0.1, schedule='', fraction_warmup_steps=0.1, eval_skip=1, print_freq=100, freeze_lm=True, model_name='/root/.cache/huggingface/transformers/deberta-v2-xlarge', ds_factor_attn=8, ds_factor_ff=8, ft_ln=True, freeze_mlm=True, dropout=0.1, scratch=False, n_ans=0, freeze_last=True, test=True, save_dir='zsactivitynet', presave_dir='', device='cuda', seed=42, load='checkpoints/frozenbilm_activitynet.pth', resume=False, start_epoch=0, eval=True, num_workers=3, world_size=4, dist_url='env://', max_feats=10, features_dim=768, use_video=True, use_context=True, max_tokens=256, max_atokens=5, prefix='', suffix='.', rank=0, gpu=0, distributed=True, dist_backend='nccl')
Traceback (most recent call last):
  File "/mnt/lustre/lychen/code/sm/FrozenBiLM/videoqa.py", line 530, in <module>
    main(args)
Traceback (most recent call last):
  File "/mnt/lustre/lychen/code/sm/FrozenBiLM/videoqa.py", line 266, in main
  File "/mnt/lustre/lychen/code/sm/FrozenBiLM/videoqa.py", line 530, in <module>
    tokenizer = get_tokenizer(args)
  File "/mnt/lustre/lychen/code/sm/FrozenBiLM/model/__init__.py", line 96, in get_tokenizer
    tokenizer = DebertaV2Tokenizer.from_pretrained(    
main(args)
  File "/mnt/lustre/anaconda3/envs/dream/lib/python3.10/site-packages/transformers/tokenization_utils_base.py", line 1777, in from_pretrained
  File "/mnt/lustre/lychen/code/sm/FrozenBiLM/videoqa.py", line 266, in main
    tokenizer = get_tokenizer(args)
  File "/mnt/lustre/lychen/code/sm/FrozenBiLM/model/__init__.py", line 96, in get_tokenizer
    tokenizer = DebertaV2Tokenizer.from_pretrained(
  File "/mnt/lustre/anaconda3/envs/dream/lib/python3.10/site-packages/transformers/tokenization_utils_base.py", line 1777, in from_pretrained
    return cls._from_pretrained(
  File "/mnt/lustre/anaconda3/envs/dream/lib/python3.10/site-packages/transformers/tokenization_utils_base.py", line 1932, in _from_pretrained
    return cls._from_pretrained(
  File "/mnt/lustre/anaconda3/envs/dream/lib/python3.10/site-packages/transformers/tokenization_utils_base.py", line 1932, in _from_pretrained
    tokenizer = cls(*init_inputs, **init_kwargs)
  File "/mnt/lustre/anaconda3/envs/dream/lib/python3.10/site-packages/transformers/models/deberta_v2/tokenization_deberta_v2.py", line 149, in __init__
    self._tokenizer = SPMTokenizer(vocab_file, split_by_punct=split_by_punct, sp_model_kwargs=self.sp_model_kwargs)
  File "/mnt/lustre/anaconda3/envs/dream/lib/python3.10/site-packages/transformers/models/deberta_v2/tokenization_deberta_v2.py", line 301, in __init__
    tokenizer = cls(*init_inputs, **init_kwargs)
  File "/mnt/lustre/anaconda3/envs/dream/lib/python3.10/site-packages/transformers/models/deberta_v2/tokenization_deberta_v2.py", line 149, in __init__
    self._tokenizer = SPMTokenizer(vocab_file, split_by_punct=split_by_punct, sp_model_kwargs=self.sp_model_kwargs)
  File "/mnt/lustre/anaconda3/envs/dream/lib/python3.10/site-packages/transformers/models/deberta_v2/tokenization_deberta_v2.py", line 301, in __init__
    spm.load(vocab_file)
  File "/mnt/lustre/anaconda3/envs/dream/lib/python3.10/site-packages/sentencepiece/__init__.py", line 905, in Load
    spm.load(vocab_file)
  File "/mnt/lustre/anaconda3/envs/dream/lib/python3.10/site-packages/sentencepiece/__init__.py", line 905, in Load
    return self.LoadFromFile(model_file)
  File "/mnt/lustre/anaconda3/envs/dream/lib/python3.10/site-packages/sentencepiece/__init__.py", line 310, in LoadFromFile
    return self.LoadFromFile(model_file)
  File "/mnt/lustre/anaconda3/envs/dream/lib/python3.10/site-packages/sentencepiece/__init__.py", line 310, in LoadFromFile
    return _sentencepiece.SentencePieceProcessor_LoadFromFile(self, arg)
RuntimeError    return _sentencepiece.SentencePieceProcessor_LoadFromFile(self, arg): 
Internal: src/sentencepiece_processor.cc(1101) [model_proto->ParseFromArray(serialized.data(), serialized.size())] 
RuntimeError: Internal: src/sentencepiece_processor.cc(1101) [model_proto->ParseFromArray(serialized.data(), serialized.size())] 
Traceback (most recent call last):
  File "/mnt/lustre/lychen/code/sm/FrozenBiLM/videoqa.py", line 530, in <module>
    main(args)
  File "/mnt/lustre/lychen/code/sm/FrozenBiLM/videoqa.py", line 266, in main
    tokenizer = get_tokenizer(args)
  File "/mnt/lustre/lychen/code/sm/FrozenBiLM/model/__init__.py", line 96, in get_tokenizer
    tokenizer = DebertaV2Tokenizer.from_pretrained(
  File "/mnt/lustre/anaconda3/envs/dream/lib/python3.10/site-packages/transformers/tokenization_utils_base.py", line 1777, in from_pretrained
    return cls._from_pretrained(
  File "/mnt/lustre/anaconda3/envs/dream/lib/python3.10/site-packages/transformers/tokenization_utils_base.py", line 1932, in _from_pretrained
Traceback (most recent call last):
  File "/mnt/lustre/lychen/code/sm/FrozenBiLM/videoqa.py", line 530, in <module>
    tokenizer = cls(*init_inputs, **init_kwargs)
  File "/mnt/lustre/anaconda3/envs/dream/lib/python3.10/site-packages/transformers/models/deberta_v2/tokenization_deberta_v2.py", line 149, in __init__
    self._tokenizer = SPMTokenizer(vocab_file, split_by_punct=split_by_punct, sp_model_kwargs=self.sp_model_kwargs)
  File "/mnt/lustre/anaconda3/envs/dream/lib/python3.10/site-packages/transformers/models/deberta_v2/tokenization_deberta_v2.py", line 301, in __init__
        spm.load(vocab_file)main(args)

  File "/mnt/lustre/anaconda3/envs/dream/lib/python3.10/site-packages/sentencepiece/__init__.py", line 905, in Load
  File "/mnt/lustre/lychen/code/sm/FrozenBiLM/videoqa.py", line 266, in main
    return self.LoadFromFile(model_file)
      File "/mnt/lustre/anaconda3/envs/dream/lib/python3.10/site-packages/sentencepiece/__init__.py", line 310, in LoadFromFile
tokenizer = get_tokenizer(args)
  File "/mnt/lustre/lychen/code/sm/FrozenBiLM/model/__init__.py", line 96, in get_tokenizer
    return _sentencepiece.SentencePieceProcessor_LoadFromFile(self, arg)
RuntimeError: Internal: src/sentencepiece_processor.cc(1101) [model_proto->ParseFromArray(serialized.data(), serialized.size())] 
    tokenizer = DebertaV2Tokenizer.from_pretrained(
  File "/mnt/lustre/anaconda3/envs/dream/lib/python3.10/site-packages/transformers/tokenization_utils_base.py", line 1777, in from_pretrained
    return cls._from_pretrained(
  File "/mnt/lustre/anaconda3/envs/dream/lib/python3.10/site-packages/transformers/tokenization_utils_base.py", line 1932, in _from_pretrained
    tokenizer = cls(*init_inputs, **init_kwargs)
  File "/mnt/lustre/anaconda3/envs/dream/lib/python3.10/site-packages/transformers/models/deberta_v2/tokenization_deberta_v2.py", line 149, in __init__
    self._tokenizer = SPMTokenizer(vocab_file, split_by_punct=split_by_punct, sp_model_kwargs=self.sp_model_kwargs)
  File "/mnt/lustre/anaconda3/envs/dream/lib/python3.10/site-packages/transformers/models/deberta_v2/tokenization_deberta_v2.py", line 301, in __init__
    spm.load(vocab_file)
  File "/mnt/lustre/anaconda3/envs/dream/lib/python3.10/site-packages/sentencepiece/__init__.py", line 905, in Load
    return self.LoadFromFile(model_file)
  File "/mnt/lustre/anaconda3/envs/dream/lib/python3.10/site-packages/sentencepiece/__init__.py", line 310, in LoadFromFile
    return _sentencepiece.SentencePieceProcessor_LoadFromFile(self, arg)
RuntimeError: Internal: src/sentencepiece_processor.cc(1101) [model_proto->ParseFromArray(serialized.data(), serialized.size())] 
ERROR:torch.distributed.elastic.multiprocessing.api:failed (exitcode: 1) local_rank: 0 (pid: 1066196) of binary: /mnt/lustre/anaconda3/envs/dream/bin/python
Traceback (most recent call last):
  File "/mnt/lustre/anaconda3/envs/dream/lib/python3.10/runpy.py", line 196, in _run_module_as_main
    return _run_code(code, main_globals, None,
  File "/mnt/lustre/anaconda3/envs/dream/lib/python3.10/runpy.py", line 86, in _run_code
    exec(code, run_globals)
  File "/mnt/lustre/anaconda3/envs/dream/lib/python3.10/site-packages/torch/distributed/run.py", line 766, in <module>
    main()
  File "/mnt/lustre/anaconda3/envs/dream/lib/python3.10/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 346, in wrapper
    return f(*args, **kwargs)
  File "/mnt/lustre/anaconda3/envs/dream/lib/python3.10/site-packages/torch/distributed/run.py", line 762, in main
    run(args)
  File "/mnt/lustre/anaconda3/envs/dream/lib/python3.10/site-packages/torch/distributed/run.py", line 753, in run
    elastic_launch(
  File "/mnt/lustre/anaconda3/envs/dream/lib/python3.10/site-packages/torch/distributed/launcher/api.py", line 132, in __call__
    return launch_agent(self._config, self._entrypoint, list(args))
  File "/mnt/lustre/anaconda3/envs/dream/lib/python3.10/site-packages/torch/distributed/launcher/api.py", line 246, in launch_agent
    raise ChildFailedError(
torch.distributed.elastic.multiprocessing.errors.ChildFailedError: 
============================================================
videoqa.py FAILED
------------------------------------------------------------
Failures:
[1]:
  time      : 2022-11-07_10:48:31
  host      : localhost.vm
  rank      : 1 (local_rank: 1)
  exitcode  : 1 (pid: 1066197)
  error_file: <N/A>
  traceback : To enable traceback see: https://pytorch.org/docs/stable/elastic/errors.html
[2]:
  time      : 2022-11-07_10:48:31
  host      : localhost.vm
  rank      : 2 (local_rank: 2)
  exitcode  : 1 (pid: 1066198)
  error_file: <N/A>
  traceback : To enable traceback see: https://pytorch.org/docs/stable/elastic/errors.html
[3]:
  time      : 2022-11-07_10:48:31
  host      : localhost.vm
  rank      : 3 (local_rank: 3)
  exitcode  : 1 (pid: 1066199)
  error_file: <N/A>
  traceback : To enable traceback see: https://pytorch.org/docs/stable/elastic/errors.html
------------------------------------------------------------
Root Cause (first observed failure):
[0]:
  time      : 2022-11-07_10:48:31
  host      : localhost.vm
  rank      : 0 (local_rank: 0)
  exitcode  : 1 (pid: 1066196)
  error_file: <N/A>
  traceback : To enable traceback see: https://pytorch.org/docs/stable/elastic/errors.html
============================================================

This isssue is the same as the one below. It looks like some prblem from vocab. How can we fix it?

sentencepiece\sentencepiece\src\sentencepiece_processor.cc(1102) [model_proto->ParseFromArray(serialized.data(), serialized.size())] · Issue #20011 · huggingface/transformers https://github.com/huggingface/transformers/issues/20011

antoyang commented 2 years ago

Did you download the tokenizer from Hugging Face?

cliangyu commented 2 years ago

Yes I did

Get Outlook for Androidhttps://aka.ms/AAb9ysg


From: Antoine Yang @.> Sent: Thursday, November 10, 2022 6:51:37 AM To: antoyang/FrozenBiLM @.> Cc: Liangyu Chen @.>; Author @.> Subject: Re: [antoyang/FrozenBiLM] Problematic Tokennizer? (Issue #3)

Did you download the tokenizer from Hugging Face?

— Reply to this email directly, view it on GitHubhttps://github.com/antoyang/FrozenBiLM/issues/3#issuecomment-1309495556, or unsubscribehttps://github.com/notifications/unsubscribe-auth/AKYMSEWQYTASXFQPO727DUDWHQTHTANCNFSM6AAAAAARYXWZWU. You are receiving this because you authored the thread.Message ID: @.***>

antoyang commented 2 years ago

It seems the tokenizer weights are not properly loaded. I would double check if the environment variable TRANSFORMERS_CACHE matches where the tokenizer is downloaded.