Open GabrielXie opened 1 year ago
您的checkpoint是不是下载失败了,您删除重新下载试试
您的checkpoint是不是下载失败了,您删除重新下载试试
还是不行,我删掉了checkpoints 目录下的pangu但是仍然不行,下面是日志:
(jittor) PS F:\test-code\JittorLLMs> python web_demo.py pangualpha
Downloading https://cg.cs.tsinghua.edu.cn/jittor/pangu/assets/build/checkpoints/model_optim_rng.pth to C:\Users\xgp\.cache\jittor\jt1.3.7\cl\py3.8.16\Windows-10-10.x52\AMDRyzen75800Xxc8\default\cu11.2.67\checkpoints\pangu/model_optim_rng.pth
4.89GB [01:47, 49.0MB/s]
WARNING: APEX is not installed, multi_tensor_applier will not be available.
WARNING: APEX is not installed, using torch.nn.LayerNorm instead of apex.normalization.FusedLayerNorm!
F:\test-code\JittorLLMs\models\pangualpha
using world size: 1 and model-parallel size: 1
using torch.float32 for parameters ...
WARNING: overriding default arguments for tokenizer_type:GPT2BPETokenizer with tokenizer_type:GPT2BPETokenizer
-------------------- arguments --------------------
adlr_autoresume ................. False
adlr_autoresume_interval ........ 1000
apply_query_key_layer_scaling ... False
apply_residual_connection_post_layernorm False
attention_dropout ............... 0.1
attention_softmax_in_fp32 ....... False
batch_size ...................... 1
bert_load ....................... None
bias_dropout_fusion ............. False
bias_gelu_fusion ................ False
block_data_path ................. None
checkpoint_activations .......... False
checkpoint_num_layers ........... 1
clip_grad ....................... 1.0
data_impl ....................... infer
data_path ....................... None
DDP_impl ........................ local
distribute_checkpointed_activations False
distributed_backend ............. nccl
dynamic_loss_scale .............. True
eod_mask_loss ................... False
eval_interval ................... 1000
eval_iters ...................... 100
exit_interval ................... None
faiss_use_gpu ................... False
finetune ........................ True
fp16 ............................ False
fp16_lm_cross_entropy ........... False
fp32_allreduce .................. False
genfile ......................... None
greedy .......................... False
hidden_dropout .................. 0.1
hidden_size ..................... 2560
hysteresis ...................... 2
ict_head_size ................... None
ict_load ........................ None
indexer_batch_size .............. 128
indexer_log_interval ............ 1000
init_method_std ................. 0.02
layernorm_epsilon ............... 1e-05
lazy_mpu_init ................... None
load ............................ C:\Users\xgp\.cache\jittor\jt1.3.7\cl\py3.8.16\Windows-10-10.x52\AMDRyzen75800Xxc8\default\cu11.2.67\checkpoints\pangu\Pangu-alpha_2.6B_fp16_mgt
local_rank ...................... None
log_interval .................... 100
loss_scale ...................... None
loss_scale_window ............... 1000
lr .............................. None
lr_decay_iters .................. None
lr_decay_style .................. linear
make_vocab_size_divisible_by .... 1
mask_prob ....................... 0.15
max_position_embeddings ......... 1024
merge_file ...................... None
min_lr .......................... 0.0
min_scale ....................... 1
mmap_warmup ..................... False
model_parallel_size ............. 1
no_load_optim ................... False
no_load_rng ..................... False
no_save_optim ................... False
no_save_rng ..................... False
num_attention_heads ............. 32
num_layers ...................... 31
num_samples ..................... 0
num_unique_layers ............... None
num_workers ..................... 2
onnx_safe ....................... None
openai_gelu ..................... False
out_seq_length .................. 50
override_lr_scheduler ........... False
param_sharing_style ............. grouped
params_dtype .................... torch.float32
query_in_block_prob ............. 0.1
rank ............................ 0
recompute ....................... False
report_topk_accuracies .......... []
reset_attention_mask ............ False
reset_position_ids .............. False
sample_input_file ............... None
sample_output_file .............. None
save ............................ None
save_interval ................... None
scaled_upper_triang_masked_softmax_fusion False
seed ............................ 1234
seq_length ...................... 1024
short_seq_prob .................. 0.1
split ........................... 969, 30, 1
temperature ..................... 1.0
tensorboard_dir ................. None
titles_data_path ................ None
tokenizer_type .................. GPT2BPETokenizer
top_k ........................... 2
top_p ........................... 0.0
train_iters ..................... None
use_checkpoint_lr_scheduler ..... False
use_cpu_initialization .......... False
use_one_sent_docs ............... False
vocab_file ...................... models/pangualpha/megatron/tokenizer/bpe_4w_pcl/vocab
warmup .......................... 0.01
weight_decay .................... 0.01
world_size ...................... 1
---------------- end of arguments ----------------
> building GPT2BPETokenizer tokenizer ...
> padded vocab (size: 40000) with 0 dummy tokens (new size: 40000)
torch distributed is already initialized, skipping initialization ...
> initializing model parallel with size 1
> setting random seeds to 1234 ...
> initializing model parallel cuda seeds on global rank 0, model parallel rank 0, and data parallel rank 0 with model parallel seed: 3952 and data parallel seed: 1234
building GPT2 model ...
> number of parameters on model parallel rank 0: 2625295360
global rank 0 is loading checkpoint C:\Users\xgp\.cache\jittor\jt1.3.7\cl\py3.8.16\Windows-10-10.x52\AMDRyzen75800Xxc8\default\cu11.2.67\checkpoints\pangu\Pangu-alpha_2.6B_fp16_mgt\iter_0001000\mp_rank_00\model_optim_rng.pth
could not load the checkpoint
从文件夹外移动过来了,还是不行