internlm2.5基座模型的适配，能否更完善下文档？

SmartFlowAI / EmoLLM

心理健康大模型、LLM、The Big Model of Mental Health、Finetune、InternLM2、InternLM2.5、Qwen、ChatGLM、Baichuan、DeepSeek、Mixtral、LLama3、GLM4、Qwen2、LLama3.1

https://openxlab.org.cn/apps/detail/chg0901/EmoLLMV3.0

MIT License

758 stars 113 forks source link

internlm2.5基座模型的适配，能否更完善下文档？ #268

Closed RyanOvO closed 2 months ago

RyanOvO commented 2 months ago

响应issue

可以放出 internlm2.5 微调训练时的数据集文件列表以及是如何转成 jsonl 的么？

aJupyter commented 2 months ago

好的后续更新一下

hi-pengyu commented 2 months ago

先提供一个json转jsonl的python脚本，后续完善文档。


import json

def json_array_to_jsonl(json_file_path, jsonl_file_path):
    """  
    将包含JSON数组的文件转换为JSONL格式。  

    参数:  
    - json_file_path: 输入的JSON文件路径  
    - jsonl_file_path: 输出的JSONL文件路径  
    """
    with open(json_file_path, 'r', encoding='utf-8') as json_file:
        # 加载整个JSON数组  
        data = json.load(json_file)

    with open(jsonl_file_path, 'w', encoding='utf-8') as jsonl_file:
        # 遍历并写入JSONL文件  
        for obj in data:
            jsonl_file.write(json.dumps(obj, ensure_ascii=False) + '\n')

        # 使用示例

json_array_to_jsonl('aa.json', 'output2.jsonl')

RyanOvO commented 2 months ago

先提供一个json转jsonl的python脚本，后续完善文档。

import json

def json_array_to_jsonl(json_file_path, jsonl_file_path):
    """  
    将包含JSON数组的文件转换为JSONL格式。  

    参数:  
    - json_file_path: 输入的JSON文件路径  
    - jsonl_file_path: 输出的JSONL文件路径  
    """
    with open(json_file_path, 'r', encoding='utf-8') as json_file:
        # 加载整个JSON数组  
        data = json.load(json_file)

    with open(jsonl_file_path, 'w', encoding='utf-8') as jsonl_file:
        # 遍历并写入JSONL文件  
        for obj in data:
            jsonl_file.write(json.dumps(obj, ensure_ascii=False) + '\n')

        # 使用示例

json_array_to_jsonl('aa.json', 'output2.jsonl')

好的，感谢。

chg0901 commented 1 month ago

[EmoLLM][InternLM2.5]EmoLLM V3.0 前瞻: 基于InternLM2.5-7B-Chat全量微调实践 - 知乎 https://zhuanlan.zhihu.com/p/708931911

可以参考这个文档

以及在open issue里爹系男友回复里看下