PaddlePaddle / PaddleNLP

👑 Easy-to-use and powerful NLP and LLM library with 🤗 Awesome model zoo, supporting wide-range of NLP tasks from research to industrial applications, including 🗂Text Classification, 🔍 Neural Search, ❓ Question Answering, ℹ️ Information Extraction, 📄 Document Intelligence, 💌 Sentiment Analysis etc.
https://paddlenlp.readthedocs.io
Apache License 2.0
12.11k stars 2.94k forks source link

[Bug]: 无法运行sentiment_analysis/ASO_analysis的demo #7671

Closed evan-qianjh closed 6 months ago

evan-qianjh commented 11 months ago

请提出你的问题

【运行环境】

【问题描述】

  1. 执行 sh run_demo.sh后,输入“蛋糕味道不错,很好吃,店家很耐心,服务也很好,很棒”后,崩溃;
  2. 按照文档创建./data/test.txt,执行sh run_predict.sh后,崩溃。

【执行sh run_demo.sh 结果】

[➜  ASO_analysis git:(v2.6.1) sh run_demo.sh 
/Users/qianjh/Library/Python/3.9/lib/python/site-packages/urllib3/__init__.py:34: NotOpenSSLWarning: urllib3 v2 only supports OpenSSL 1.1.1+, currently the 'ssl' module is compiled with 'LibreSSL 2.8.3'. See: https://github.com/urllib3/urllib3/issues/3020
  warnings.warn(
[2023-12-15 21:19:46,890] [    INFO] - Already cached /Users/qianjh/.paddlenlp/models/skep_ernie_1.0_large_ch/skep_ernie_1.0_large_ch.vocab.txt
[2023-12-15 21:19:46,894] [    INFO] - tokenizer config file saved in /Users/qianjh/.paddlenlp/models/skep_ernie_1.0_large_ch/tokenizer_config.json
[2023-12-15 21:19:46,894] [    INFO] - Special tokens file saved in /Users/qianjh/.paddlenlp/models/skep_ernie_1.0_large_ch/special_tokens_map.json
label dict loaded.
[2023-12-15 21:19:48,719] [    INFO] - Already cached /Users/qianjh/.paddlenlp/models/skep_ernie_1.0_large_ch/model_state.pdparams
[2023-12-15 21:19:48,720] [    INFO] - Loading weights file model_state.pdparams from cache at /Users/qianjh/.paddlenlp/models/skep_ernie_1.0_large_ch/model_state.pdparams
[2023-12-15 21:19:50,517] [    INFO] - Loaded weights file from disk, setting weights to model.
[2023-12-15 21:19:55,241] [    INFO] - All model checkpoint weights were used when initializing SkepForTokenClassification.

[2023-12-15 21:19:55,241] [ WARNING] - Some weights of SkepForTokenClassification were not initialized from the model checkpoint at skep_ernie_1.0_large_ch and are newly initialized: ['classifier.weight', 'classifier.bias']
You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference.
extraction model loaded.
[2023-12-15 21:19:57,622] [    INFO] - Already cached /Users/qianjh/.paddlenlp/models/skep_ernie_1.0_large_ch/model_state.pdparams
[2023-12-15 21:19:57,623] [    INFO] - Loading weights file model_state.pdparams from cache at /Users/qianjh/.paddlenlp/models/skep_ernie_1.0_large_ch/model_state.pdparams
[2023-12-15 21:19:59,645] [    INFO] - Loaded weights file from disk, setting weights to model.
[2023-12-15 21:20:05,207] [    INFO] - All model checkpoint weights were used when initializing SkepForSequenceClassification.

[2023-12-15 21:20:05,209] [ WARNING] - Some weights of SkepForSequenceClassification were not initialized from the model checkpoint at skep_ernie_1.0_large_ch and are newly initialized: ['classifier.weight', 'classifier.bias']
You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference.
classification model loaded.
input text: 
蛋糕味道不错,很好吃,店家很耐心,服务也很好,很棒
/Users/qianjh/Library/Python/3.9/lib/python/site-packages/paddlenlp/transformers/tokenizer_utils_base.py:2293: FutureWarning: The `max_seq_len` argument is deprecated and will be removed in a future version, please use `max_length` instead.
  warnings.warn(
/Users/qianjh/Library/Python/3.9/lib/python/site-packages/paddlenlp/transformers/tokenizer_utils_base.py:1865: UserWarning: Truncation was not explicitly activated but `max_length` is provided a specific value, please use `truncation=True` to explicitly truncate examples to max length. Defaulting to 'longest_first' truncation strategy. If you encode pairs of sequences (GLUE-style) with the tokenizer you can select this strategy more precisely by providing a specific strategy to `truncation`.
  warnings.warn(
run_demo.sh: line 23:  5140 Bus error: 10           python demo.py --ext_model_path "./checkpoints/ext_checkpoints/best.pdparams" --cls_model_path "./checkpoints/cls_checkpoints/best.pdparams" --ext_label_path "./data/ext_data/label.dict" --cls_label_path "./data/cls_data/label.dict" --ext_max_seq_len 512 --cls_max_seq_len 256

【执行sh run_predict.sh结果】

➜  ASO_analysis git:(v2.6.1) sh run_predict.sh
/Users/qianjh/Library/Python/3.9/lib/python/site-packages/urllib3/__init__.py:34: NotOpenSSLWarning: urllib3 v2 only supports OpenSSL 1.1.1+, currently the 'ssl' module is compiled with 'LibreSSL 2.8.3'. See: https://github.com/urllib3/urllib3/issues/3020
  warnings.warn(
[2023-12-15 21:36:02,046] [ WARNING] - Detected that datasets module was imported before paddlenlp. This may cause PaddleNLP datasets to be unavalible in intranet. Please import paddlenlp before datasets module to avoid download issues
[2023-12-15 21:36:03,573] [    INFO] - Already cached /Users/qianjh/.paddlenlp/models/skep_ernie_1.0_large_ch/skep_ernie_1.0_large_ch.vocab.txt
[2023-12-15 21:36:03,577] [    INFO] - tokenizer config file saved in /Users/qianjh/.paddlenlp/models/skep_ernie_1.0_large_ch/tokenizer_config.json
[2023-12-15 21:36:03,578] [    INFO] - Special tokens file saved in /Users/qianjh/.paddlenlp/models/skep_ernie_1.0_large_ch/special_tokens_map.json
test data loaded.
[2023-12-15 21:36:05,738] [    INFO] - Already cached /Users/qianjh/.paddlenlp/models/skep_ernie_1.0_large_ch/model_state.pdparams
[2023-12-15 21:36:05,741] [    INFO] - Loading weights file model_state.pdparams from cache at /Users/qianjh/.paddlenlp/models/skep_ernie_1.0_large_ch/model_state.pdparams
[2023-12-15 21:36:08,101] [    INFO] - Loaded weights file from disk, setting weights to model.
[2023-12-15 21:36:17,196] [    INFO] - All model checkpoint weights were used when initializing SkepForTokenClassification.

[2023-12-15 21:36:17,213] [ WARNING] - Some weights of SkepForTokenClassification were not initialized from the model checkpoint at skep_ernie_1.0_large_ch and are newly initialized: ['classifier.bias', 'classifier.weight']
You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference.
extraction model loaded.
run_predict.sh: line 26:  5542 Segmentation fault: 11  python predict.py --ext_model_path "./checkpoints/ext_checkpoints/best.pdparams" --cls_model_path "./checkpoints/cls_checkpoints/best.pdparams" --test_path "./data/test.txt" --ext_label_path "./data/ext_data/label.dict" --cls_label_path "./data/cls_data/label.dict" --save_path "./data/sentiment_results.json" --batch_size 8 --ext_max_seq_len 512 --cls_max_seq_len 256
/Library/Developer/CommandLineTools/Library/Frameworks/Python3.framework/Versions/3.9/lib/python3.9/multiprocessing/resource_tracker.py:216: UserWarning: resource_tracker: There appear to be 1 leaked semaphore objects to clean up at shutdown
  warnings.warn('resource_tracker: There appear to be %d '
github-actions[bot] commented 8 months ago

This issue is stale because it has been open for 60 days with no activity. 当前issue 60天内无活动,被标记为stale。

w5688414 commented 6 months ago

请问你的机器的内存多大呢?这个模型有点大,另外还是不推荐在mac上运行大模型,推荐在linux上使用gpu运行:

https://github.com/PaddlePaddle/PaddleNLP/blob/de149078f883a2e278047c5c06f77b76d5a52431/applications/sentiment_analysis/ASO_analysis/demo.py#L112C19-L112C42

evan-qianjh commented 6 months ago

请问你的机器的内存多大呢?这个模型有点大,另外还是不推荐在mac上运行大模型,推荐在linux上使用gpu运行:

https://github.com/PaddlePaddle/PaddleNLP/blob/de149078f883a2e278047c5c06f77b76d5a52431/applications/sentiment_analysis/ASO_analysis/demo.py#L112C19-L112C42

16G内存。