Closed yrg5101 closed 1 year ago
当前的思路就是我们推荐的最优思路了,很赞。期待反馈效果,有任何问题也欢迎多反馈或者贡献 PR。
当前的思路就是我们推荐的最优思路了,很赞。期待反馈效果,有任何问题也欢迎多反馈或者贡献 PR。
我们用ernie-gram-zh做了排序, 但是发现速度比较慢,我们是32G+4core CPU, 300条样本要用1-2分钟,感觉有点慢,所有有了如下问题
1.对于多核cpu是不是在ernie-gram-zh模型预测层面做提速?怎么做?
2.ernie-gram-zh模型是否可以做剪裁和量化? 怎么做?安装ernie3.0模型的裁剪和量化来做? https://github.com/PaddlePaddle/PaddleNLP/tree/develop/model_zoo/ernie-3.0
3.对于ernie-gram-zh做排序是否可以换成ernie-3.0-medium-zh来做排序
我们也进行了替换测试,但是报了如下错误:
PaddlePaddle/PaddleNLP/tree/develop/applications/neural_search/ranking/ernie_matching/ 使用这个进行排序, 现在想用ernie-3.0-medium-zh来代替原来的ernie-gram-zh模型, 在export_model.py中进行动态模型导出静态模型,做的修改如下:
原来: if name == "main":
# tokenizer = ppnlp.transformers.ErnieTokenizer.from_pretrained('ernie-1.0')
# pretrained_model = ppnlp.transformers.ErnieModel.from_pretrained("ernie-1.0")
pretrained_model = ppnlp.transformers.ErnieGramModel.from_pretrained(
'ernie-gram-zh')
tokenizer = ppnlp.transformers.ErnieGramTokenizer.from_pretrained(
'ernie-gram-zh')
model = PairwiseMatching(pretrained_model)
修改之后: if name == "main":
# tokenizer = ppnlp.transformers.ErnieTokenizer.from_pretrained('ernie-1.0')
# pretrained_model = ppnlp.transformers.ErnieModel.from_pretrained("ernie-1.0")
pretrained_model = ppnlp.transformers.ErnieGramModel.from_pretrained(
'ernie-gram-zh')
tokenizer = ppnlp.transformers.ErnieGramTokenizer.from_pretrained(
'ernie-3.0-medium-zh')
model = PairwiseMatching(pretrained_model)
报错: [2022-06-20 22:57:05,782] [ INFO] - Downloading https://bj.bcebos.com/paddlenlp/models/community/ernie-3.0-medium-zh\model_state.pdparams and saved to C:\Users\Administrator.paddlenlp\models\ernie-3.0-medium-zh [2022-06-20 22:57:05,783] [ INFO] - Downloading model_state.pdparams from https://bj.bcebos.com/paddlenlp/models/community/ernie-3.0-medium-zh\model_state.pdparams [2022-06-20 22:57:05,998] [ ERROR] - Downloading from https://bj.bcebos.com/paddlenlp/models/community/ernie-3.0-medium-zh\model_state.pdparams failed with code 404! Traceback (most recent call last): File "C:\Users\Administrator\Desktop\tx\PaddleNLP\paddlenlp\transformers\model_utils.py", line 253, in from_pretrained file_path, default_root) File "C:\Users\Administrator\Desktop\tx\PaddleNLP\paddlenlp\utils\downloader.py", line 164, in get_path_from_url fullpath = _download(url, root_dir, md5sum) File "C:\Users\Administrator\Desktop\tx\PaddleNLP\paddlenlp\utils\downloader.py", line 201, in _download "{}!".format(url, req.status_code)) RuntimeError: Downloading from https://bj.bcebos.com/paddlenlp/models/community/ernie-3.0-medium-zh\model_state.pdparams failed with code 404!
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "C:/Users/Administrator/Desktop/tx/PaddleNLP/applications/neural_search/ranking/ernie_matching/export_model.py", line 40, in
pretrained_model = ppnlp.transformers.ErnieModel.from_pretrained(
''ernie-3.0-medium-zh')
tokenizer = ppnlp.transformers.ErnieTokenizer.from_pretrained(
'ernie-3.0-medium-zh')
model = PairwiseMatching(pretrained_model)
This issue is stale because it has been open for 60 days with no activity. 当前issue 60天内无活动,被标记为stale。
This issue was closed because it has been inactive for 14 days since being marked as stale. 当前issue 被标记为stale已有14天,即将关闭。
欢迎您反馈PaddleNLP使用问题,非常感谢您对PaddleNLP的贡献! 在留下您的问题时,辛苦您同步提供如下信息: 1)PaddleNLP 2.3.0.dev,PaddlePaddle2.3.0 2)系统环境:Windows,python:3.7
现在想法是
1.simcse的训练基于ernie3.0来训练, 下面的from_pretrained都改为ernie-3.0, 然后生成一个simcse的模型
pretrained_model = ppnlp.transformers.ErnieModel.from_pretrained( args.model_name_or_path, hidden_dropout_prob=args.dropout, attention_probs_dropout_prob=args.dropout) print("loading model from {}".format(args.model_name_or_path)) tokenizer = ppnlp.transformers.ErnieTokenizer.from_pretrained('ernie-1.0')
2.训练in_batch_negative模型, 基于第一步生成的simcse模型 pretrained_model = ppnlp.transformers.ErnieModel.from_pretrained( 'simcse模型路径')
3.做排序模型, 在上面模型得到embedding之后,还是基于ernie-gram-zh做排序
pretrained_model = ppnlp.transformers.ErnieGramModel.from_pretrained( 'ernie-gram-zh') tokenizer = ppnlp.transformers.ErnieGramTokenizer.from_pretrained( 'ernie-gram-zh')
请教,基于上面的思路,是否有问题,还是能否有什么更好的优化点? 谢谢