Facico / Chinese-Vicuna

Chinese-Vicuna: A Chinese Instruction-following LLaMA-based Model —— 一个中文低资源的llama+lora方案,结构参考alpaca
https://github.com/Facico/Chinese-Vicuna
Apache License 2.0
4.14k stars 421 forks source link

关于使用纯C++推理问题 #57

Closed BUPTccy closed 1 year ago

BUPTccy commented 1 year ago

感谢您的分享,由于C++水平有限,在尝试使用cpu推理过程中出现如下问题,还烦劳解答~

已生成文件checkpoint-final/ggml-model-f16.bin 使用下述指令产生报错

cd tools/Vicuna.cpp
make chat

报错信息如下

In file included from /usr/local/gcc/include/c++/5.2.0/random:35:0,
                 from utils.h:8,
                 from chat.cpp:3:
/usr/local/gcc/include/c++/5.2.0/bits/c++0x_warning.h:32:2: error: #error This file requires compiler and library support for the ISO C++ 2011 standard. This support is currently experimental, and must be enabled with the -std=c++11 or -std=gnu++11 compiler options.
 #error This file requires compiler and library support for the \

还有很多如下本身c++相关error

utils.h:17:53: error: ‘std::thread’ has not been declared
     int32_t n_threads = std::min(16, (int32_t) std::thread::hardware_concurrency());
                                                     ^
utils.h:44:31: error: ‘mt19937’ is not a member of ‘std’
 std::string gpt_random_prompt(std::mt19937 & rng);
BUPTccy commented 1 year ago

@Facico 请问能否提供一下需要的cmake gcc版本或者是相应环境配置链接~

Facico commented 1 year ago

不好意思,c++推理那块不是我直接负责的,所以没有马上回你,gcc版本如下。

>gcc -v
Using built-in specs.
COLLECT_GCC=gcc
COLLECT_LTO_WRAPPER=/usr/lib/gcc/x86_64-linux-gnu/9/lto-wrapper
OFFLOAD_TARGET_NAMES=nvptx-none:hsa
OFFLOAD_TARGET_DEFAULT=1
Target: x86_64-linux-gnu
Configured with: ../src/configure -v --with-pkgversion='Ubuntu 9.4.0-1ubuntu1~20.04.1' --with-bugurl=file:///usr/share/doc/gcc-9/README.Bugs --enable-languages=c,ada,c++,go,brig,d,fortran,objc,obj-c++,gm2 --prefix=/usr --with-gcc-major-version-only --program-suffix=-9 --program-prefix=x86_64-linux-gnu- --enable-shared --enable-linker-build-id --libexecdir=/usr/lib --without-included-gettext --enable-threads=posix --libdir=/usr/lib --enable-nls --enable-clocale=gnu --enable-libstdcxx-debug --enable-libstdcxx-time=yes --with-default-libstdcxx-abi=new --enable-gnu-unique-object --disable-vtable-verify --enable-plugin --enable-default-pie --with-system-zlib --with-target-system-zlib=auto --enable-objc-gc=auto --enable-multiarch --disable-werror --with-arch-32=i686 --with-abi=m64 --with-multilib-list=m32,m64,mx32 --enable-multilib --with-tune=generic --enable-offload-targets=nvptx-none=/build/gcc-9-Av3uEd/gcc-9-9.4.0/debian/tmp-nvptx/usr,hsa --without-cuda-driver --enable-checking=release --build=x86_64-linux-gnu --host=x86_64-linux-gnu --target=x86_64-linux-gnu
Thread model: posix
gcc version 9.4.0 (Ubuntu 9.4.0-1ubuntu1~20.04.1)

cmake环境链接可以参考这里 由于c++推理部分是我们引入别人的仓库弄进来的,更具体的配置可以参考alpaca.cpp

BUPTccy commented 1 year ago

明白~ 感谢您提供的信息 我们已通过重装新版gcc等解决,另外想问下如果是想把推理(cpu/gpu)改成api形式进行输入和输出,您有什么建议嘛

Facico commented 1 year ago

看你们想要的应用场景 1、应用需求上可以写一些前置的instruction来固定任务,比如翻译、文本生成等 2、api具体形式,反正都是文字流,什么形式问题应该不大。比较简单稳定的肯定是非流式的处理。

BUPTccy commented 1 year ago

十分感谢 我先研究一下这个代码如何改造成api OvO