MCLAB-OCR / KnowledgeMiningWithSceneText

37 stars 3 forks source link

Can't load wiki archive knowbert_wiki_wordnet_model.tar.gz. #4

Closed JingjunYi closed 9 months ago

JingjunYi commented 1 year ago

When i run the training code, there is a mistake, can you help me solve this, thanks a lot. json.decoder.JSONDecodeError: Expecting property name enclosed in double quotes: line 381 column 5 (char 15191)

Error log: (scenetext) [yjj23@gpu2 KnowledgeMiningWithSceneText-main]$ CUDA_VISIBLE_DEVICES=0 python main.py -c configs/train_knowbert_attention_bottle.toml [2023-09-14 23:54:00,256][RANK=00][I]: unknown_args=[] [main.py:114] [2023-09-14 23:54:03,193][RANK=00][I]: Better speed can be achieved with apex installed from https://www.github.com/nvidia/apex . [/home/yjj23/anaconda3/envs/scenetext/lib/python3.8/site-packages/pytorch_pretrained_bert/modeling.py:230] /home/yjj23/anaconda3/envs/scenetext/lib/python3.8/site-packages/sklearn/utils/linearassignment.py:18: FutureWarning: The linearassignment module is deprecated in 0.21 and will be removed from 0.23. Use scipy.optimize.linear_sum_assignment instead. warnings.warn( [2023-09-14 23:54:04,196][RANK=00][I]: instantiating class <class 'allennlp.data.token_indexers.token_indexer.TokenIndexer'> from params {'type': 'characters_tokenizer', 'tokenizer': {'type': 'word', 'word_splitter': {'type': 'just_spaces'}}, 'namespace': 'entity'} and extras set() [/home/yjj23/SceneText/allennlp-master/allennlp/common/from_params.py:256] [2023-09-14 23:54:04,196][RANK=00][I]: type = characters_tokenizer [/home/yjj23/SceneText/allennlp-master/allennlp/common/params.py:251] [2023-09-14 23:54:04,196][RANK=00][I]: instantiating class <class 'allennlp.data.tokenizers.tokenizer.Tokenizer'> from params {'type': 'word', 'word_splitter': {'type': 'just_spaces'}} and extras set() [/home/yjj23/SceneText/allennlp-master/allennlp/common/from_params.py:256] [2023-09-14 23:54:04,196][RANK=00][I]: tokenizer.type = word [/home/yjj23/SceneText/allennlp-master/allennlp/common/params.py:251] [2023-09-14 23:54:04,196][RANK=00][I]: instantiating class <class 'allennlp.data.tokenizers.word_tokenizer.WordTokenizer'> from params {'word_splitter': {'type': 'just_spaces'}} and extras set() [/home/yjj23/SceneText/allennlp-master/allennlp/common/from_params.py:256] [2023-09-14 23:54:04,197][RANK=00][I]: instantiating class <class 'allennlp.data.tokenizers.word_splitter.WordSplitter'> from params {'type': 'just_spaces'} and extras set() [/home/yjj23/SceneText/allennlp-master/allennlp/common/from_params.py:256] [2023-09-14 23:54:04,197][RANK=00][I]: tokenizer.word_splitter.type = just_spaces [/home/yjj23/SceneText/allennlp-master/allennlp/common/params.py:251] [2023-09-14 23:54:04,197][RANK=00][I]: instantiating class <class 'allennlp.data.tokenizers.word_splitter.JustSpacesWordSplitter'> from params {} and extras set() [/home/yjj23/SceneText/allennlp-master/allennlp/common/from_params.py:256] [2023-09-14 23:54:04,197][RANK=00][I]: tokenizer.start_tokens = None [/home/yjj23/SceneText/allennlp-master/allennlp/common/params.py:251] [2023-09-14 23:54:04,197][RANK=00][I]: tokenizer.end_tokens = None [/home/yjj23/SceneText/allennlp-master/allennlp/common/params.py:251] [2023-09-14 23:54:04,197][RANK=00][I]: instantiating class <class 'allennlp.data.token_indexers.token_characters_indexer.TokenCharactersIndexer'> from params {'namespace': 'entity'} and extras set() [/home/yjj23/SceneText/allennlp-master/allennlp/common/from_params.py:256] [2023-09-14 23:54:04,197][RANK=00][I]: namespace = entity [/home/yjj23/SceneText/allennlp-master/allennlp/common/params.py:251] [2023-09-14 23:54:04,197][RANK=00][I]: start_tokens = None [/home/yjj23/SceneText/allennlp-master/allennlp/common/params.py:251] [2023-09-14 23:54:04,197][RANK=00][I]: end_tokens = None [/home/yjj23/SceneText/allennlp-master/allennlp/common/params.py:251] [2023-09-14 23:54:04,197][RANK=00][I]: min_padding_length = 0 [/home/yjj23/SceneText/allennlp-master/allennlp/common/params.py:251] /home/yjj23/SceneText/allennlp-master/allennlp/data/token_indexers/token_characters_indexer.py:47: UserWarning: You are using the default value (0) of min_padding_length, which can cause some subtle bugs (more info see https://github.com/allenai/allennlp/issues/1954). Strongly recommend to set a value, usually the maximum size of the convolutional layer size when using CnnEncoder. warnings.warn("You are using the default value (0) of min_padding_length, " [2023-09-14 23:54:04,281][RANK=00][I]: start logging [/home/yjj23/SceneText/KnowledgeMiningWithSceneText-main/train_knowbert.py:230] [2023-09-14 23:54:04,281][RANK=00][I]: OUTPUT_DIR: ./outputs/vit_knowbert_bottle_0914235404 [/home/yjj23/SceneText/KnowledgeMiningWithSceneText-main/train_knowbert.py:231] [2023-09-14 23:54:04,281][RANK=00][I]: TB_DIR: ./outputs/vit_knowbert_bottle_0914235404/others/tb_logs [/home/yjj23/SceneText/KnowledgeMiningWithSceneText-main/train_knowbert.py:232]

{'ACCUM_ITERS': 32, 'ATTENTION_DEV': False, 'BATCH_SIZE_PERGPU': 8, 'BERT_BOTTLE_CHECKPOINT_PATH': 'pretrained/BERT_pretrained_on_bottle.pth', 'BERT_MLM_CHECKPOINT_PATH': 'pretrained/BERT_pretrained_mlm.pth', 'DATASET_TYPE': 'bottle', 'DEBUG': False, 'DEVICE': 'cuda', 'DISTRIBUTED': False, 'DYNACONF_INCLUDE': ['train_base.toml', 'bottle.toml'], 'EFFECTIVE_BATCH_SIZE': 256, 'EMBEDDING_PATH': '', 'EMBEDDING_PATH_FASTTEXT': '/data1/yjj/SceneText/bottle/fasttext', 'EMBEDDING_PATH_GLOVE': '/data1/yjj/SceneText/bottle/glove/glove_300', 'FREEZE_VIT_KNOWBERT': False, 'GOOGLE_OCR_PATH': '/data1/yjj/SceneText/bottle/google_ocr', 'HEAD_TYPE': 18, 'IMG_ONLY': False, 'INTERACTION': {}, 'INTERACTION_MODEL': True, 'IS_TESTING_LR_RANGE': False, 'LOAD_DOTENV': True, 'LOCAL_RANK': 0, 'LOG_EVERY_STEP': 100, 'LR': 3e-05, 'LR_COSINE_T0': 1000, 'LR_COSINE_T_MULT': 1, 'LR_NO_RESTARTS': False, 'LR_WARMUP_STEP': 1000, 'MASTER_ADDR': '127.0.0.1', 'NUM_CLASS': 20, 'NUM_EPOCHS': 50, 'NUM_EPOCH_FREEZE': 40, 'NUM_T': 25, 'NUM_WORKERS': 8, 'OUTPUT_DIR': './outputs/vit_knowbert_bottle_0914235404', 'POSITION_EMBEDDING': False, 'PRETRAINED_BERT': 'Wikipedia', 'PRETRAINED_VISION': 'ImageNet', 'PRETRAINED_WHOLE_MODEL': 'None', 'RANK': 0, 'ROOT_PATH': '/data1/yjj/SceneText/bottle', 'SAVE_MODEL_EVERY_STEP': 16000, 'SEED': 42, 'SGD_MOMENTUM': 0.9, 'SGD_WEIGHT_DECAY': 0, 'TB_DIR': './outputs/vit_knowbert_bottle_0914235404/others/tb_logs', 'TEST_ONLY': False, 'TEXT_BACKBONE': 'knowbert', 'TEXT_PATH': '/data1/yjj/SceneText/bottle/texts', 'TOKEN_ONLY': False, 'UNFREEZE_ALL_STEP': 3080, 'USE_AMP': True, 'USE_BBOX_EMBEDDING': False, 'USE_CATEGORY': False, 'USE_GOOGLE_OCR': True, 'USE_MULTISTEP': False, 'USE_NUM_T': True, 'USE_PADDLE_OCR': False, 'USE_TIMM': True, 'VALIDATION_EVERY_STEP': 400, 'VISION_BACKBONE': 'vit', 'VISION_BOTTLE_CHECKPOINT_PATH': 'best_trained_model.pth', 'VIT_IMAGENET_CHECKPOINT_PATH': 'pretrained/ViT-B_16.npz', 'VKAC_DROPOUT': 0.0, 'WIKIDATA_PATH': '/data1/yjj/SceneText/bottle/wikidata_result', 'WORLD_SIZE': 1} [/home/yjj23/SceneText/KnowledgeMiningWithSceneText-main/train_knowbert.py:233] [2023-09-14 23:54:04,283][RANK=00][I]: cfg.local_rank=0, cfg.rank=0, cfg.world_size=1, cfg.distributed=False [/home/yjj23/SceneText/KnowledgeMiningWithSceneText-main/train_knowbert.py:234] [2023-09-14 23:54:04,284][RANK=00][I]: loading datasets... [/home/yjj23/SceneText/KnowledgeMiningWithSceneText-main/train_knowbert.py:107] [2023-09-14 23:54:04,332][RANK=00][I]: len(trainset): 12325 [/home/yjj23/SceneText/KnowledgeMiningWithSceneText-main/train_knowbert.py:141] [2023-09-14 23:54:04,332][RANK=00][I]: len(valset): 6163 [/home/yjj23/SceneText/KnowledgeMiningWithSceneText-main/train_knowbert.py:142] [2023-09-14 23:54:04,332][RANK=00][I]: len(train_loader): 1541 [/home/yjj23/SceneText/KnowledgeMiningWithSceneText-main/train_knowbert.py:179] [2023-09-14 23:54:04,332][RANK=00][I]: len(val_loader): 771 [/home/yjj23/SceneText/KnowledgeMiningWithSceneText-main/train_knowbert.py:180] [2023-09-14 23:54:04,333][RANK=00][I]: new cfg.unfreeze_all_step = 62640 [/home/yjj23/SceneText/KnowledgeMiningWithSceneText-main/train_knowbert.py:289] [2023-09-14 23:54:04,333][RANK=00][I]: new cfg.LR_COSINE_T0 = 3042 [/home/yjj23/SceneText/KnowledgeMiningWithSceneText-main/train_knowbert.py:290] [2023-09-14 23:54:08,174][RANK=00][I]: archive_file = https://allennlp.s3-us-west-2.amazonaws.com/knowbert/models/knowbert_wiki_wordnet_model.tar.gz [/home/yjj23/SceneText/allennlp-master/allennlp/common/params.py:251] [2023-09-14 23:54:08,175][RANK=00][I]: overrides = None [/home/yjj23/SceneText/allennlp-master/allennlp/common/params.py:251] [2023-09-14 23:54:09,389][RANK=00][I]: https://allennlp.s3-us-west-2.amazonaws.com/knowbert/models/knowbert_wiki_wordnet_model.tar.gz not found in cache, downloading to /tmp/tmpxakfp5fi [/home/yjj23/SceneText/allennlp-master/allennlp/common/file_utils.py:222] 100%|████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 1935373195/1935373195 [03:18<00:00, 9755375.98B/s] [2023-09-14 23:57:29,130][RANK=00][I]: copying /tmp/tmpxakfp5fi to cache at /home/yjj23/.allennlp/cache/f9ae390d324418b5fd7be2cdd2344e53aa911e6f442647664e60409fe3997116.aa3ff7c3c096d56d836ce729ee5ca504b205dd51ab08b4523a4f87edcc2e6cc7 [/home/yjj23/SceneText/allennlp-master/allennlp/common/file_utils.py:235] [2023-09-14 23:57:33,610][RANK=00][I]: creating metadata file for /home/yjj23/.allennlp/cache/f9ae390d324418b5fd7be2cdd2344e53aa911e6f442647664e60409fe3997116.aa3ff7c3c096d56d836ce729ee5ca504b205dd51ab08b4523a4f87edcc2e6cc7 [/home/yjj23/SceneText/allennlp-master/allennlp/common/file_utils.py:239] [2023-09-14 23:57:33,614][RANK=00][I]: removing temp file /tmp/tmpxakfp5fi [/home/yjj23/SceneText/allennlp-master/allennlp/common/file_utils.py:245] [2023-09-14 23:57:34,031][RANK=00][I]: loading archive file https://allennlp.s3-us-west-2.amazonaws.com/knowbert/models/knowbert_wiki_wordnet_model.tar.gz from cache at /home/yjj23/.allennlp/cache/f9ae390d324418b5fd7be2cdd2344e53aa911e6f442647664e60409fe3997116.aa3ff7c3c096d56d836ce729ee5ca504b205dd51ab08b4523a4f87edcc2e6cc7 [/home/yjj23/SceneText/allennlp-master/allennlp/models/archival.py:175] [2023-09-14 23:57:34,032][RANK=00][I]: extracting archive file /home/yjj23/.allennlp/cache/f9ae390d324418b5fd7be2cdd2344e53aa911e6f442647664e60409fe3997116.aa3ff7c3c096d56d836ce729ee5ca504b205dd51ab08b4523a4f87edcc2e6cc7 to temp dir /tmp/tmp6rttolqm [/home/yjj23/SceneText/allennlp-master/allennlp/models/archival.py:182] [2023-09-14 23:57:50,807][RANK=00][W]: _jsonnet not loaded, treating /tmp/tmp6rttolqm/config.json as json [/home/yjj23/SceneText/allennlp-master/allennlp/common/params.py:21] Traceback (most recent call last): File "main.py", line 121, in main() File "main.py", line 117, in main return train_knowbert.main() File "/home/yjj23/SceneText/KnowledgeMiningWithSceneText-main/train_knowbert.py", line 294, in main model = NetWithAttention(cfg) File "/home/yjj23/SceneText/KnowledgeMiningWithSceneText-main/model/vit_knowbert_interaction_timm.py", line 56, in init self.knowbert = ModelArchiveFromParams.from_params(params=params) File "/home/yjj23/SceneText/kb-master/kb/include_all.py", line 50, in from_params archive = load_archive(archive_file) File "/home/yjj23/SceneText/allennlp-master/allennlp/models/archival.py", line 214, in load_archive config = Params.from_file(os.path.join(serialization_dir, CONFIG_NAME), overrides) File "/home/yjj23/SceneText/allennlp-master/allennlp/common/params.py", line 459, in from_file file_dict = json.loads(evaluate_file(params_file, ext_vars=ext_vars)) File "/home/yjj23/anaconda3/envs/scenetext/lib/python3.8/json/init.py", line 357, in loads return _default_decoder.decode(s) File "/home/yjj23/anaconda3/envs/scenetext/lib/python3.8/json/decoder.py", line 337, in decode obj, end = self.raw_decode(s, idx=_w(s, 0).end()) File "/home/yjj23/anaconda3/envs/scenetext/lib/python3.8/json/decoder.py", line 353, in raw_decode obj, end = self.scan_once(s, idx) json.decoder.JSONDecodeError: Expecting property name enclosed in double quotes: line 381 column 5 (char 15191) [2023-09-14 23:57:50,820][RANK=00][I]: removing temporary unarchived model dir at /tmp/tmp6rttolqm [/home/yjj23/SceneText/allennlp-master/allennlp/models/archival.py:237]

github-actions[bot] commented 1 year ago

Hi! This is your first issue. Welcome!

Leojc commented 1 year ago

It seems an error occur when decoding this file /tmp/tmp6rttolqm/config.json . You can open and see what's wrong. Or maybe download it again manually.