binary-husky / gpt_academic

为GPT/GLM等LLM大语言模型提供实用化交互接口,特别优化论文阅读/润色/写作体验,模块化设计,支持自定义快捷按钮&函数插件,支持Python和C++等项目剖析&自译解功能,PDF/LaTex论文翻译&总结功能,支持并行问询多种LLM模型,支持chatglm3等本地模型。接入通义千问, deepseekcoder, 讯飞星火, 文心一言, llama2, rwkv, claude2, moss等。
https://github.com/binary-husky/gpt_academic/wiki/online
GNU General Public License v3.0
65.51k stars 8.05k forks source link

[Bug]: Docker 镜像构建时并没有调用warm_up_vectordb预热nltk.download("punkt") #1971

Open awwaawwa opened 1 month ago

awwaawwa commented 1 month ago

Installation Method | 安装方法与平台

Others (Please Describe)

Version | 版本

Latest | 最新版

OS | 操作系统

Docker

Describe the bug | 简述

类似 docs/GithubAction+NoLocal+Latex: RUN python3 -c 'from check_proxy import warm_up_modules; warm_up_modules()'

Screen Shot | 有帮助的截图

网络不太好的话,运行docker的时候会卡在这里 CleanShot 2024-09-20 at 12 50 46@2x

Terminal Traceback & Material to Help Reproduce Bugs | 终端traceback(如有) + 帮助我们复现的测试材料样本(如有)

No response

awwaawwa commented 1 month ago

把那句话改成以下这句应该能解决

RUN python3  -c 'from check_proxy import warm_up_modules, warm_up_vectordb; warm_up_modules(); warm_up_vectordb();'
hongyi-zhao commented 1 month ago

对于直接源码运行的情况,我最终采用了下面的方法:

$ proxychains-ng-socks5 python -c "import nltk, os; nltk.download('punkt', download_dir=os.path.expanduser('~') + '/.pyenv/versions/gpt_academic/lib/python3.11/site-packages/llama_index/core/_static/nltk_cache/')"

Or, use python as follows:

In [1]: import nltk
   ...: import os
   ...: 
   ...: # 设置代理
   ...: proxy_url = 'http://127.0.0.1:8080'
   ...: os.environ['HTTP_PROXY'] = proxy_url
   ...: os.environ['HTTPS_PROXY'] = proxy_url
   ...: 
   ...: # 设置下载目录
   ...: home = os.path.expanduser('~')
   ...: download_dir = f"{home}/.pyenv/versions/gpt_academic/lib/python3.11/site-packages/llama_index/core/_static/nltk_cache/"
   ...: 
   ...: # 确保下载目录存在
   ...: os.makedirs(download_dir, exist_ok=True)
   ...: 
   ...: # 下载 'punkt' 数据包
   ...: nltk.download('punkt', download_dir=download_dir, quiet=False)
   ...: 
   ...: print(f"NLTK data downloaded to {download_dir}")
[nltk_data] Downloading package punkt to /home/werner/.pyenv/versions/
[nltk_data]     gpt_academic/lib/python3.11/site-
[nltk_data]     packages/llama_index/core/_static/nltk_cache/...
[nltk_data]   Unzipping tokenizers/punkt.zip.
NLTK data downloaded to /home/werner/.pyenv/versions/gpt_academic/lib/python3.11/site-packages/llama_index/core/_static/nltk_cache/