A-baoYang / alpaca-7b-chinese

Finetune LLaMA-7B with Chinese instruction datasets
Creative Commons Zero v1.0 Universal
135 stars 16 forks source link

TypeError: not a string #3

Open SeekPoint opened 1 year ago

SeekPoint commented 1 year ago

(gh_alpaca-7b-chinese) ub2004@ub2004-B85M-A0:~/llm_dev/alpaca-7b-chinese/finetune$ (gh_alpaca-7b-chinese) ub2004@ub2004-B85M-A0:~/llm_dev/alpaca-7b-chinese/finetune$ (gh_alpaca-7b-chinese) ub2004@ub2004-B85M-A0:~/llm_dev/alpaca-7b-chinese/finetune$ python3 finetune.py --base_model bigscience/bloomz-7b1-mt --data_dir /home/ub2004/llm_dev/alpaca-7b-chinese/data/general/alpaca-en-zh.json --output_dir ../finetuned/bloomz-7b1-mt_alpaca-en-zh --lora_target_modules '["query_key_value"]'

===================================BUG REPORT=================================== Welcome to bitsandbytes. For bug reports, please submit your error trace to: https://github.com/TimDettmers/bitsandbytes/issues

/home/ub2004/.local/lib/python3.8/site-packages/bitsandbytes/cuda_setup/main.py:136: UserWarning: WARNING: The following directories listed in your path were found to be non-existent: {PosixPath('/home/ub2004/anaconda3/envs/gh_alpaca-7b-chinese/lib')} warn(msg) /home/ub2004/.local/lib/python3.8/site-packages/bitsandbytes/cuda_setup/main.py:136: UserWarning: /home/ub2004/anaconda3/envs/gh_alpaca-7b-chinese did not contain libcudart.so as expected! Searching further paths... warn(msg) /home/ub2004/.local/lib/python3.8/site-packages/bitsandbytes/cuda_setup/main.py:136: UserWarning: WARNING: The following directories listed in your path were found to be non-existent: {PosixPath('@/tmp/.ICE-unix/2101,unix/ub2004-B85M-A0'), PosixPath('local/ub2004-B85M-A0')} warn(msg) /home/ub2004/.local/lib/python3.8/site-packages/bitsandbytes/cuda_setup/main.py:136: UserWarning: WARNING: The following directories listed in your path were found to be non-existent: {PosixPath('/etc/xdg/xdg-ubuntu')} warn(msg) /home/ub2004/.local/lib/python3.8/site-packages/bitsandbytes/cuda_setup/main.py:136: UserWarning: WARNING: The following directories listed in your path were found to be non-existent: {PosixPath('1'), PosixPath('0')} warn(msg) /home/ub2004/.local/lib/python3.8/site-packages/bitsandbytes/cuda_setup/main.py:136: UserWarning: WARNING: The following directories listed in your path were found to be non-existent: {PosixPath('/org/gnome/Terminal/screen/9452ad9e_f9da_4eba_aff1_19fe374cdc1e')} warn(msg) CUDA_SETUP: WARNING! libcudart.so not found in any environmental path. Searching /usr/local/cuda/lib64... CUDA SETUP: CUDA runtime path found: /usr/local/cuda/lib64/libcudart.so CUDA SETUP: Highest compute capability among GPUs detected: 6.1 CUDA SETUP: Detected CUDA version 117 /home/ub2004/.local/lib/python3.8/site-packages/bitsandbytes/cuda_setup/main.py:136: UserWarning: WARNING: Compute capability < 7.5 detected! Only slow 8-bit matmul is supported for your GPU! warn(msg) CUDA SETUP: Loading binary /home/ub2004/.local/lib/python3.8/site-packages/bitsandbytes/libbitsandbytes_cuda117_nocublaslt.so... /usr/lib/python3/dist-packages/requests/init.py:89: RequestsDependencyWarning: urllib3 (1.26.15) or chardet (3.0.4) doesn't match a supported version! warnings.warn("urllib3 ({}) or chardet ({}) doesn't match a supported " Finetune parameters: base_model: bigscience/bloomz-7b1-mt model_type: llama data_dir: /home/ub2004/llm_dev/alpaca-7b-chinese/data/general/alpaca-en-zh.json output_dir: ../finetuned/bloomz-7b1-mt_alpaca-en-zh batch_size: 128 micro_batch_size: 1 num_epochs: 20 learning_rate: 0.0003 cutoff_len: 512 val_set_size: 2000 lora_r: 8 lora_alpha: 16 lora_dropout: 0.05 lora_target_modules: ['query_key_value'] train_on_inputs: True group_by_length: True

-1 -1 -1 False The tokenizer class you load from this checkpoint is not the same type as the class this function is called from. It may result in unexpected tokenization. The tokenizer class you load from this checkpoint is 'BloomTokenizerFast'. The class this function is called from is 'LlamaTokenizer'. ╭─────────────────────────────── Traceback (most recent call last) ────────────────────────────────╮ │ /home/ub2004/llm_dev/alpaca-7b-chinese/finetune/finetune.py:259 in │ │ │ │ 256 │ │ 257 │ │ 258 if name == "main": │ │ ❱ 259 │ main() │ │ 260 │ │ │ │ /home/ub2004/.local/lib/python3.8/site-packages/click/core.py:1130 in call │ │ │ │ 1127 │ │ │ 1128 │ def call(self, *args: t.Any, kwargs: t.Any) -> t.Any: │ │ 1129 │ │ """Alias for :meth:main.""" │ │ ❱ 1130 │ │ return self.main(*args, kwargs) │ │ 1131 │ │ 1132 │ │ 1133 class Command(BaseCommand): │ │ │ │ /home/ub2004/.local/lib/python3.8/site-packages/click/core.py:1055 in main │ │ │ │ 1052 │ │ try: │ │ 1053 │ │ │ try: │ │ 1054 │ │ │ │ with self.make_context(prog_name, args, extra) as ctx: │ │ ❱ 1055 │ │ │ │ │ rv = self.invoke(ctx) │ │ 1056 │ │ │ │ │ if not standalone_mode: │ │ 1057 │ │ │ │ │ │ return rv │ │ 1058 │ │ │ │ │ # it's not safe to ctx.exit(rv) here! │ │ │ │ /home/ub2004/.local/lib/python3.8/site-packages/click/core.py:1404 in invoke │ │ │ │ 1401 │ │ │ echo(style(message, fg="red"), err=True) │ │ 1402 │ │ │ │ 1403 │ │ if self.callback is not None: │ │ ❱ 1404 │ │ │ return ctx.invoke(self.callback, *ctx.params) │ │ 1405 │ │ │ 1406 │ def shell_complete(self, ctx: Context, incomplete: str) -> t.List["CompletionItem"]: │ │ 1407 │ │ """Return a list of completions for the incomplete value. Looks │ │ │ │ /home/ub2004/.local/lib/python3.8/site-packages/click/core.py:760 in invoke │ │ │ │ 757 │ │ │ │ 758 │ │ with augment_usage_errors(self): │ │ 759 │ │ │ with ctx: │ │ ❱ 760 │ │ │ │ return callback(args, kwargs) │ │ 761 │ │ │ 762 │ def forward( │ │ 763 │ │ self, cmd: "Command", *args: t.Any, *kwargs: t.Any # noqa: B902 │ │ │ │ /home/ub2004/llm_dev/alpaca-7b-chinese/finetune/finetune.py:188 in main │ │ │ │ 185 │ │ device = torch.device("cuda" if torch.cuda.is_available() else "cpu") │ │ 186 │ │ device_map = "auto" │ │ 187 │ │ │ ❱ 188 │ tokenizer, model = decide_model(args=local_args, device_map=device_map) │ │ 189 │ data = load_dataset("json", data_files=data_dir) │ │ 190 │ │ 191 │ │ │ │ /home/ub2004/llm_dev/alpaca-7b-chinese/finetune/finetune.py:66 in decide_model │ │ │ │ 63 │ │ │ device_map=device_map │ │ 64 │ │ ) │ │ 65 │ else: │ │ ❱ 66 │ │ tokenizer = _MODEL_CLASSES[model_type].tokenizer.from_pretrained(args.base_model │ │ 67 │ │ model = _MODEL_CLASSES[model_type].model.from_pretrained( │ │ 68 │ │ │ args.base_model, │ │ 69 │ │ │ load_in_8bit=True, │ │ │ │ /home/ub2004/.local/lib/python3.8/site-packages/transformers/tokenization_utils_base.py:1811 in │ │ from_pretrained │ │ │ │ 1808 │ │ │ else: │ │ 1809 │ │ │ │ logger.info(f"loading file {file_path} from cache at {resolved_vocab_fil │ │ 1810 │ │ │ │ ❱ 1811 │ │ return cls._from_pretrained( │ │ 1812 │ │ │ resolved_vocab_files, │ │ 1813 │ │ │ pretrained_model_name_or_path, │ │ 1814 │ │ │ init_configuration, │ │ │ │ /home/ub2004/.local/lib/python3.8/site-packages/transformers/tokenization_utils_base.py:1965 in │ │ _from_pretrained │ │ │ │ 1962 │ │ │ │ 1963 │ │ # Instantiate tokenizer. │ │ 1964 │ │ try: │ │ ❱ 1965 │ │ │ tokenizer = cls(init_inputs, init_kwargs) │ │ 1966 │ │ except OSError: │ │ 1967 │ │ │ raise OSError( │ │ 1968 │ │ │ │ "Unable to load vocabulary from file. " │ │ │ │ /home/ub2004/.local/lib/python3.8/site-packages/transformers/models/llama/tokenization_llama.py: │ │ 96 in init │ │ │ │ 93 │ │ self.add_bos_token = add_bos_token │ │ 94 │ │ self.add_eos_token = add_eos_token │ │ 95 │ │ self.sp_model = spm.SentencePieceProcessor(self.sp_model_kwargs) │ │ ❱ 96 │ │ self.sp_model.Load(vocab_file) │ │ 97 │ │ │ 98 │ def getstate(self): │ │ 99 │ │ state = self.dict.copy() │ │ │ │ /home/ub2004/.local/lib/python3.8/site-packages/sentencepiece/init.py:905 in Load │ │ │ │ 902 │ │ raise RuntimeError('model_file and model_proto must be exclusive.') │ │ 903 │ if model_proto: │ │ 904 │ │ return self.LoadFromSerializedProto(model_proto) │ │ ❱ 905 │ return self.LoadFromFile(model_file) │ │ 906 │ │ 907 │ │ 908 # Register SentencePieceProcessor in _sentencepiece: │ │ │ │ /home/ub2004/.local/lib/python3.8/site-packages/sentencepiece/init.py:310 in LoadFromFile │ │ │ │ 307 │ │ return _sentencepiece.SentencePieceProcessor_serialized_model_proto(self) │ │ 308 │ │ │ 309 │ def LoadFromFile(self, arg): │ │ ❱ 310 │ │ return _sentencepiece.SentencePieceProcessor_LoadFromFile(self, arg) │ │ 311 │ │ │ 312 │ def _EncodeAsIds(self, text, enable_sampling, nbest_size, alpha, add_bos, add_eos, r │ │ 313 │ │ return _sentencepiece.SentencePieceProcessor__EncodeAsIds(self, text, enable_sam │ ╰──────────────────────────────────────────────────────────────────────────────────────────────────╯ TypeError: not a string (gh_alpaca-7b-chinese) ub2004@ub2004-B85M-A0:~/llm_dev/alpaca-7b-chinese/finetune$