TigerResearch / TigerBot

TigerBot: A multi-language multi-task LLM
https://www.tigerbot.com
Apache License 2.0
2.24k stars 194 forks source link

关于微调的问题 #40

Closed 631068264 closed 1 year ago

631068264 commented 1 year ago

数据集一个长什么样的,什么格式,对应列含义是什么

mint-vip commented 1 year ago

pretrain数据集主要是纯文本, 使用content字段即可 sft数据集为标准的alpaca数据格式 { "instruction": "组合以下单词以形成合理的句子。" -- 指令, "input": "世界/最/高/山峰/是/什么?" -- 输入, "output": "世界最高的山峰是珠穆朗玛峰。" -- 输出 }

631068264 commented 1 year ago

本地微调完后,怎么应用新模型?

631068264 commented 1 year ago

微调时候有报错

{'loss': 0.5684, 'learning_rate': 1e-05, 'epoch': 0.13}
  3%|▎         | 10/395 [06:42<4:11:50, 39.25s/Traceback (most recent call last):
  File "/data/home/yaokj5/dl/apps/TigerBot/train/./train_sft.py", line 127, in <module>
    main()
  File "/data/home/yaokj5/dl/apps/TigerBot/train/./train_sft.py", line 121, in main
    trainer.train()
  File "/data/home/yaokj5/anaconda3/envs/tigerbot/lib/python3.10/site-packages/transformers/trainer.py", line 1662, in train
    return inner_training_loop(
  File "/data/home/yaokj5/anaconda3/envs/tigerbot/lib/python3.10/site-packages/transformers/trainer.py", line 2006, in _inner_training_loop
    self._maybe_log_save_evaluate(tr_loss, model, trial, epoch, ignore_keys_for_eval)
  File "/data/home/yaokj5/anaconda3/envs/tigerbot/lib/python3.10/site-packages/transformers/trainer.py", line 2287, in _maybe_log_save_evaluate
    metrics = self.evaluate(ignore_keys=ignore_keys_for_eval)
  File "/data/home/yaokj5/anaconda3/envs/tigerbot/lib/python3.10/site-packages/transformers/trainer.py", line 2993, in evaluate
    output = eval_loop(
  File "/data/home/yaokj5/anaconda3/envs/tigerbot/lib/python3.10/site-packages/transformers/trainer.py", line 3281, in evaluation_loop
    metrics = self.compute_metrics(EvalPrediction(predictions=all_preds, label_ids=all_labels))
  File "/data/home/yaokj5/dl/apps/TigerBot/train/./train_sft.py", line 43, in compute_metrics
    metric = evaluate.load("accuracy")
  File "/data/home/yaokj5/anaconda3/envs/tigerbot/lib/python3.10/site-packages/evaluate/loading.py", line 731, in load
    evaluation_module = evaluation_module_factory(
  File "/data/home/yaokj5/anaconda3/envs/tigerbot/lib/python3.10/site-packages/evaluate/loading.py", line 680, in evaluation_module_factory
    raise e1 from None
  File "/data/home/yaokj5/anaconda3/envs/tigerbot/lib/python3.10/site-packages/evaluate/loading.py", line 639, in evaluation_module_factory
    ).get_module()
  File "/data/home/yaokj5/anaconda3/envs/tigerbot/lib/python3.10/site-packages/evaluate/loading.py", line 479, in get_module
    local_path = self.download_loading_script(revision)
  File "/data/home/yaokj5/anaconda3/envs/tigerbot/lib/python3.10/site-packages/evaluate/loading.py", line 469, in download_loading_script
    return cached_path(file_path, download_config=download_config)
  File "/data/home/yaokj5/anaconda3/envs/tigerbot/lib/python3.10/site-packages/evaluate/utils/file_utils.py", line 224, in cached_path
    output_path = get_from_cache(
  File "/data/home/yaokj5/anaconda3/envs/tigerbot/lib/python3.10/site-packages/evaluate/utils/file_utils.py", line 614, in get_from_cache
    http_get(
  File "/data/home/yaokj5/anaconda3/envs/tigerbot/lib/python3.10/site-packages/evaluate/utils/file_utils.py", line 395, in http_get
    response = _request_with_retry(
  File "/data/home/yaokj5/anaconda3/envs/tigerbot/lib/python3.10/site-packages/evaluate/utils/file_utils.py", line 360, in _request_with_retry
    raise err
  File "/data/home/yaokj5/anaconda3/envs/tigerbot/lib/python3.10/site-packages/evaluate/utils/file_utils.py", line 356, in _request_with_retry
    response = requests.request(method=method.upper(), url=url, timeout=timeout, **params)
  File "/data/home/yaokj5/anaconda3/envs/tigerbot/lib/python3.10/site-packages/requests/api.py", line 59, in request
    return session.request(method=method, url=url, **kwargs)
  File "/data/home/yaokj5/anaconda3/envs/tigerbot/lib/python3.10/site-packages/requests/sessions.py", line 589, in request
    resp = self.send(prep, **send_kwargs)
  File "/data/home/yaokj5/anaconda3/envs/tigerbot/lib/python3.10/site-packages/requests/sessions.py", line 703, in send
    r = adapter.send(request, **kwargs)
  File "/data/home/yaokj5/anaconda3/envs/tigerbot/lib/python3.10/site-packages/requests/adapters.py", line 501, in send
    raise ConnectionError(err, request=request)
requests.exceptions.ConnectionError: ('Connection aborted.', ConnectionResetError(104, 'Connection reset by peer'))
  3%|▎         | 10/395 [08:06<5:12:02, 48.63s/it]
                                               Traceback (most recent call last):
  File "/data/home/yaokj5/dl/apps/TigerBot/train/./train_sft.py", line 127, in <module>
    main()
  File "/data/home/yaokj5/dl/apps/TigerBot/train/./train_sft.py", line 121, in main
    trainer.train()
  File "/data/home/yaokj5/anaconda3/envs/tigerbot/lib/python3.10/site-packages/transformers/trainer.py", line 1662, in train
    return inner_training_loop(
  File "/data/home/yaokj5/anaconda3/envs/tigerbot/lib/python3.10/site-packages/transformers/trainer.py", line 2006, in _inner_training_loop
    self._maybe_log_save_evaluate(tr_loss, model, trial, epoch, ignore_keys_for_eval)
  File "/data/home/yaokj5/anaconda3/envs/tigerbot/lib/python3.10/site-packages/transformers/trainer.py", line 2287, in _maybe_log_save_evaluate
    metrics = self.evaluate(ignore_keys=ignore_keys_for_eval)
  File "/data/home/yaokj5/anaconda3/envs/tigerbot/lib/python3.10/site-packages/transformers/trainer.py", line 2993, in evaluate
    output = eval_loop(
  File "/data/home/yaokj5/anaconda3/envs/tigerbot/lib/python3.10/site-packages/transformers/trainer.py", line 3281, in evaluation_loop
    metrics = self.compute_metrics(EvalPrediction(predictions=all_preds, label_ids=all_labels))
  File "/data/home/yaokj5/dl/apps/TigerBot/train/./train_sft.py", line 43, in compute_metrics
    metric = evaluate.load("accuracy")
  File "/data/home/yaokj5/anaconda3/envs/tigerbot/lib/python3.10/site-packages/evaluate/loading.py", line 731, in load
    evaluation_module = evaluation_module_factory(
  File "/data/home/yaokj5/anaconda3/envs/tigerbot/lib/python3.10/site-packages/evaluate/loading.py", line 681, in evaluation_module_factory
    raise FileNotFoundError(
FileNotFoundError: Couldn't find a module script at /data/home/yaokj5/dl/apps/TigerBot/train/accuracy/accuracy.py. Module 'accuracy' doesn't exist on the Hugging Face Hub either.
[2023-06-09 17:23:38,808] [INFO] [launch.py:315:sigkill_handler] Killing subprocess 2911
[2023-06-09 17:23:38,809] [INFO] [launch.py:315:sigkill_handler] Killing subprocess 2912
Vivicai1005 commented 1 year ago

重新安装下evaluate试试: pip install evaluate

然后脚本试下 import evaluate metric = evaluate.load("accuracy") 看会不会报错

631068264 commented 1 year ago

本地微调完后,怎么应用新模型?

这个呢

i4never commented 1 year ago

本地微调完后,怎么应用新模型?

这个呢

使用train完成后生成的checkpoint文件加载模型即可。