Open shrikrishnaholla opened 1 year ago
The model loads fine for me. The named tensor should be recognized and loaded. Did you get any compilation warnings?
@klosax One thing might be that I had received this error when I ran python examples/replit/convert-h5-to-ggml.py ../teknium-Replit-v2-CodeInstruct-3B/Replit-v2-CodeInstruct-3B/ 0
Traceback (most recent call last):
File "~/ggml/examples/replit/convert-h5-to-ggml.py", line 7, in <module>
import sentencepiece.sentencepiece_model_pb2 as model
File "~/.local/lib/python3.10/site-packages/sentencepiece/sentencepiece_model_pb2.py", line 34, in <module>
_descriptor.EnumValueDescriptor(
File "/opt/miniconda3/conda/envs/textgen/lib/python3.10/site-packages/google/protobuf/descriptor.py", line 796, in __new__
_message.Message._CheckCalledFromGeneratedFile()
TypeError: Descriptors cannot not be created directly.
If this call came from a _pb2.py file, your generated code is out of date and must be regenerated with protoc >= 3.19.0.
If you cannot immediately regenerate your protos, some other possible workarounds are:
1. Downgrade the protobuf package to 3.20.x or lower.
2. Set PROTOCOL_BUFFERS_PYTHON_IMPLEMENTATION=python (but this will use pure-Python parsing and will be much slower).
So I rephrased the command like this:
PROTOCOL_BUFFERS_PYTHON_IMPLEMENTATION=python python examples/replit/convert-h5-to-ggml.py ../teknium-Replit-v2-CodeInstruct-3B/Replit-v2-CodeInstruct-3B/ 0
and it compiled successfully.
Could this have anything to do with it?
Could this have anything to do with it?
I guess not if the model file was converted successfully.
Any compilation warnings when compling the inference binary?
Nothing stood out to me in particular...
cmake .. && make -j4 replit replit-quantize
-- CMAKE_SYSTEM_PROCESSOR: x86_64
-- x86 detected
-- Linux detected
-- x86 detected
-- Linux detected
-- Configuring done (0.1s)
-- Generating done (0.3s)
-- Build files have been written to: ~/ggml/build
[ 25%] Building CXX object examples/CMakeFiles/common.dir/common.cpp.o
[ 25%] Building C object src/CMakeFiles/ggml.dir/ggml.c.o
In file included from /usr/include/string.h:535,
from ~/ggml/src/ggml.c:21:
In function ‘memcpy’,
inlined from ‘ggml_set_op_params’ at ~/ggml/src/ggml.c:4642:5,
inlined from ‘ggml_conv_1d’ at ~/ggml/src/ggml.c:6883:5:
/usr/include/x86_64-linux-gnu/bits/string_fortified.h:29:10: warning: ‘__builtin_memcpy’ offset [0, 11] is out of the bounds [0, 0] [-Warray-bounds]
29 | return __builtin___memcpy_chk (__dest, __src, __len,
| ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
30 | __glibc_objsize0 (__dest));
| ~~~~~~~~~~~~~~~~~~~~~~~~~~
In function ‘memcpy’,
inlined from ‘ggml_set_op_params’ at ~/ggml/src/ggml.c:4642:5,
inlined from ‘ggml_conv_2d’ at ~/ggml/src/ggml.c:6923:5:
/usr/include/x86_64-linux-gnu/bits/string_fortified.h:29:10: warning: ‘__builtin_memcpy’ offset [0, 23] is out of the bounds [0, 0] [-Warray-bounds]
29 | return __builtin___memcpy_chk (__dest, __src, __len,
| ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
30 | __glibc_objsize0 (__dest));
| ~~~~~~~~~~~~~~~~~~~~~~~~~~
In function ‘memcpy’,
inlined from ‘ggml_set_op_params’ at ~/ggml/src/ggml.c:4642:5,
inlined from ‘ggml_conv_1d’ at ~/ggml/src/ggml.c:6883:5,
inlined from ‘ggml_conv_1d_ph’ at ~/ggml/src/ggml.c:6942:12:
/usr/include/x86_64-linux-gnu/bits/string_fortified.h:29:10: warning: ‘__builtin_memcpy’ offset [0, 11] is out of the bounds [0, 0] [-Warray-bounds]
29 | return __builtin___memcpy_chk (__dest, __src, __len,
| ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
30 | __glibc_objsize0 (__dest));
| ~~~~~~~~~~~~~~~~~~~~~~~~~~
In function ‘memcpy’,
inlined from ‘ggml_set_op_params’ at ~/ggml/src/ggml.c:4642:5,
inlined from ‘ggml_pool_2d’ at ~/ggml/src/ggml.c:7015:5:
/usr/include/x86_64-linux-gnu/bits/string_fortified.h:29:10: warning: ‘__builtin_memcpy’ offset [0, 27] is out of the bounds [0, 0] [-Warray-bounds]
29 | return __builtin___memcpy_chk (__dest, __src, __len,
| ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
30 | __glibc_objsize0 (__dest));
| ~~~~~~~~~~~~~~~~~~~~~~~~~~
In function ‘memcpy’,
inlined from ‘ggml_set_op_params’ at ~/ggml/src/ggml.c:4642:5,
inlined from ‘ggml_win_part’ at ~/ggml/src/ggml.c:7183:5:
/usr/include/x86_64-linux-gnu/bits/string_fortified.h:29:10: warning: ‘__builtin_memcpy’ offset [0, 11] is out of the bounds [0, 0] [-Warray-bounds]
29 | return __builtin___memcpy_chk (__dest, __src, __len,
| ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
30 | __glibc_objsize0 (__dest));
| ~~~~~~~~~~~~~~~~~~~~~~~~~~
[ 37%] Linking CXX static library libcommon.a
[ 37%] Built target common
[ 50%] Linking C static library libggml.a
[ 50%] Built target ggml
[ 62%] Building CXX object examples/CMakeFiles/common-ggml.dir/common-ggml.cpp.o
[ 75%] Linking CXX static library libcommon-ggml.a
[ 75%] Built target common-ggml
[ 87%] Building CXX object examples/replit/CMakeFiles/replit.dir/main.cpp.o
[100%] Linking CXX executable ../../bin/replit
[100%] Built target replit
[ 37%] Built target common
[ 50%] Built target ggml
[ 75%] Built target common-ggml
[ 87%] Building CXX object examples/replit/CMakeFiles/replit-quantize.dir/quantize.cpp.o
[100%] Linking CXX executable ../../bin/replit-quantize
[100%] Built target replit-quantize
@klosax
string_fortified.h:29:10: warning: ‘__builtin_memcpy’ offset [0, 11] is out of the bounds [0, 0] [-Warray-bounds]
The model file seems to be fine since the tensor transformer.blocks.0.norm_1.weight
is in it. The inference binary should recognize the tensor and load it. My guess it that something is wrong with your compiler since you get warnings that could have to do with the problem. The binary does string comparison sto recognize the tensor names.
Try updating or reinstalling the compiler.
This is my version. Should it be upgraded?
$ gcc -v
Using built-in specs.
COLLECT_GCC=gcc
COLLECT_LTO_WRAPPER=/usr/lib/gcc/x86_64-linux-gnu/11/lto-wrapper
OFFLOAD_TARGET_NAMES=nvptx-none:amdgcn-amdhsa
OFFLOAD_TARGET_DEFAULT=1
Target: x86_64-linux-gnu
Configured with: ../src/configure -v --with-pkgversion='Ubuntu 11.3.0-1ubuntu1~22.04' --with-bugurl=file:///usr/share/doc/gcc-11/README.Bugs --enable-languages=c,ada,c++,go,brig,d,fortran,objc,obj-c++,m2 --prefix=/usr --with-gcc-major-version-only --program-suffix=-11 --program-prefix=x86_64-linux-gnu- --enable-shared --enable-linker-build-id --libexecdir=/usr/lib --without-included-gettext --enable-threads=posix --libdir=/usr/lib --enable-nls --enable-bootstrap --enable-clocale=gnu --enable-libstdcxx-debug --enable-libstdcxx-time=yes --with-default-libstdcxx-abi=new --enable-gnu-unique-object --disable-vtable-verify --enable-plugin --enable-default-pie --with-system-zlib --enable-libphobos-checking=release --with-target-system-zlib=auto --enable-objc-gc=auto --enable-multiarch --disable-werror --enable-cet --with-arch-32=i686 --with-abi=m64 --with-multilib-list=m32,m64,mx32 --enable-multilib --with-tune=generic --enable-offload-targets=nvptx-none=/build/gcc-11-xKiWfi/gcc-11-11.3.0/debian/tmp-nvptx/usr,amdgcn-amdhsa=/build/gcc-11-xKiWfi/gcc-11-11.3.0/debian/tmp-gcn/usr --without-cuda-driver --enable-checking=release --build=x86_64-linux-gnu --host=x86_64-linux-gnu --target=x86_64-linux-gnu --with-build-config=bootstrap-lto-lean --enable-link-serialization=2
Thread model: posix
Supported LTO compression algorithms: zlib zstd
gcc version 11.3.0 (Ubuntu 11.3.0-1ubuntu1~22.04)
I think it should work with your compiler.
But you could try change this line https://github.com/ggerganov/ggml/blob/244776a089ebed7f0332f9c8bdc38d2d40464493/examples/replit/main.cpp#L379 to
if (model.tensors.find(name) == model.tensors.end()) {
and compile again.
This worked! @klosax , thanks for your time and for the help. I was lost without you :pray:
Would this change be useful to others as well? Should I commit and raise a PR?
Great!
Then all references of name.data()
should be changed to name
, and in lines with fprintf
or printf
it should be changed to name.c_str()
Would this change be useful to others as well? Should I commit and raise a PR?
It looks like this error can also be found in other examples and all of them should be fixed.
Wouldn't that be breaking compilation of other models as well? Would you like me to try and reproduce for other classes of models before making a fix?
Because if what you say is true, then wouldn't this be a huge change? :thinking:
name
is a std::string
and should be accessed as such, the contents should not be accessed directly by data()
like it is done here.
All examples compile and works fine for me using gcc 9, so my guess is that your gcc 11 is handling this different than the older compilers, and that is the reason it wont work for you.
Understood. So if I'm understanding correctly, even if name
is accessed directly, since it is an std::string
it won't break for older compilers like the one you use, correct?
Apologies for asking what might be basic questions. My C++ is rusty, so I don't want to be creating a regression and getting angry emails :sweat_smile:
Yes the changes wont break anything for older compilers. I will make a PR for this to change all examples.
Would you like me to try and reproduce for other classes of models before making a fix?
If you like you could test one other example to see if the same error is there and if it is fixed by this change.
This issue is on similar lines as https://github.com/ggerganov/ggml/issues/248 , but is regarding replit-v2 models, not replit-v1
I am using ggml@ a30107764ca5544e3a1ead4b318e06d83ed5b14c and am having trouble loading
teknium/Replit-v2-CodeInstruct-3B
I used
examples/replit/convert-h5-to-ggml.py
to convert to ggmlf32
. Also created both aq4_1
as well asq8_0
quantized versions usingreplit-quantize
.However, when trying to load either f32, q4_1 or q8_0 versions of the models with replit (e.g.,
./bin/replit -m Replit-v2-CodeInstruct-3B-f32.bin -p "def hello_world():"
) I get:Any ideas?