Open laozhanghahaha opened 1 year ago
--data data-bin I want to know where I can get data-bin
@wxthu mkdir, then put the data in that folder
@wxthu mkdir, then put the data in that folder dataset such as GLUE ? I am new to NLP ...
@wxthu your dataset should look like this https://github.com/facebookresearch/metaseq/blob/b47f8d115516b539ba0e5002aa3ab707ad10a792/metaseq/tasks/streaming_language_modeling.py#L287
❓ Questions and Help
Before asking:
hey I downloaded the 1.3B ckpt from (https://github.com/facebookresearch/metaseq/tree/main/projects/OPT)
and I try to start finetune by this commad
opt-baselines -n 2 -g 4 -p test_v0 --model-size 1.3b --restore-file 1.3b/reshard.pt --data data-bin/ --checkpoints-dir checkpoints/ --no-save-dir --no-wandb --azure --local
but in the log it tells my No existing checkpoint found 1.3b/reshard-model_part-0-shard0.pt
I tried the convert_to_singleton.py but I only get the retored.pt, how could I get the *****shard0.pt file ?
here is the log
2023-02-17 07:04:55 | INFO | metaseq.utils | CUDA enviroments for all 4 workers 2023-02-17 07:04:55 | INFO | metaseq.cli.train | training on 4 devices (GPUs/TPUs) 2023-02-17 07:04:55 | INFO | metaseq.cli.train | max tokens per GPU = None and batch size per GPU = 32 2023-02-17 07:04:55 | WARNING | metaseq.checkpoint_utils | Proceeding without metaseq-internal installed! Please check if you need this! 2023-02-17 07:04:55 | WARNING | metaseq.checkpoint_utils | Proceeding without metaseq-internal installed! Please check if you need this! 2023-02-17 07:04:55 | WARNING | metaseq.checkpoint_utils | Proceeding without metaseq-internal installed! Please check if you need this! 2023-02-17 07:04:55 | INFO | metaseq.cli.train | nvidia-smi stats: {'gpu_0_mem_used_gb': 6.5791015625, 'gpu_1_mem_used_gb': 12.6201171875, 'gpu_2_mem_used_gb': 3.76953125, 'gpu_3_mem_used_gb': 12.6591796875, 'gpu_4_mem_used_gb': 9.486328125, 'gpu_5_mem_used_gb': 9.619140625, 'gpu_6_mem_used_gb': 9.728515625, 'gpu_7_mem_used_gb': 9.572265625} 2023-02-17 07:04:55 | WARNING | metaseq.checkpoint_utils | Proceeding without metaseq-internal installed! Please check if you need this! 2023-02-17 07:04:55 | INFO | metaseq.checkpoint_utils | attempting to load checkpoint from: 1.3b/reshard-model_part-0-shard0.pt 2023-02-17 07:04:55 | INFO | metaseq.trainer | No existing checkpoint found 1.3b/reshard-model_part-0-shard0.pt 2023-02-17 07:04:55 | INFO | metaseq.trainer | loading train data for epoch 1
pip
, source):