Cannot load teknium/Replit-v2-CodeInstruct-3B

shrikrishnaholla commented 1 year ago

This issue is on similar lines as https://github.com/ggerganov/ggml/issues/248 , but is regarding replit-v2 models, not replit-v1

I am using ggml@ a30107764ca5544e3a1ead4b318e06d83ed5b14c and am having trouble loading teknium/Replit-v2-CodeInstruct-3B

I used examples/replit/convert-h5-to-ggml.py to convert to ggml f32. Also created both a q4_1 as well as q8_0 quantized versions using replit-quantize.

However, when trying to load either f32, q4_1 or q8_0 versions of the models with replit (e.g., ./bin/replit -m Replit-v2-CodeInstruct-3B-f32.bin -p "def hello_world():") I get:

replit_model_load: unknown tensor 'transformer.blocks.0.norm_1.weight' in model file

Any ideas?

klosax commented 1 year ago

The model loads fine for me. The named tensor should be recognized and loaded. Did you get any compilation warnings?

shrikrishnaholla commented 1 year ago

@klosax One thing might be that I had received this error when I ran python examples/replit/convert-h5-to-ggml.py ../teknium-Replit-v2-CodeInstruct-3B/Replit-v2-CodeInstruct-3B/ 0

Traceback (most recent call last):
  File "~/ggml/examples/replit/convert-h5-to-ggml.py", line 7, in <module>
    import sentencepiece.sentencepiece_model_pb2 as model
  File "~/.local/lib/python3.10/site-packages/sentencepiece/sentencepiece_model_pb2.py", line 34, in <module>
    _descriptor.EnumValueDescriptor(
  File "/opt/miniconda3/conda/envs/textgen/lib/python3.10/site-packages/google/protobuf/descriptor.py", line 796, in __new__
    _message.Message._CheckCalledFromGeneratedFile()
TypeError: Descriptors cannot not be created directly.
If this call came from a _pb2.py file, your generated code is out of date and must be regenerated with protoc >= 3.19.0.
If you cannot immediately regenerate your protos, some other possible workarounds are:
 1. Downgrade the protobuf package to 3.20.x or lower.
 2. Set PROTOCOL_BUFFERS_PYTHON_IMPLEMENTATION=python (but this will use pure-Python parsing and will be much slower).

So I rephrased the command like this:

PROTOCOL_BUFFERS_PYTHON_IMPLEMENTATION=python python examples/replit/convert-h5-to-ggml.py ../teknium-Replit-v2-CodeInstruct-3B/Replit-v2-CodeInstruct-3B/ 0

and it compiled successfully.

Could this have anything to do with it?

klosax commented 1 year ago

Could this have anything to do with it?

I guess not if the model file was converted successfully.

Any compilation warnings when compling the inference binary?

shrikrishnaholla commented 1 year ago

Nothing stood out to me in particular...

cmake .. && make -j4 replit replit-quantize
-- CMAKE_SYSTEM_PROCESSOR: x86_64
-- x86 detected
-- Linux detected
-- x86 detected
-- Linux detected
-- Configuring done (0.1s)
-- Generating done (0.3s)
-- Build files have been written to: ~/ggml/build
[ 25%] Building CXX object examples/CMakeFiles/common.dir/common.cpp.o
[ 25%] Building C object src/CMakeFiles/ggml.dir/ggml.c.o
In file included from /usr/include/string.h:535,
                 from ~/ggml/src/ggml.c:21:
In function ‘memcpy’,
    inlined from ‘ggml_set_op_params’ at ~/ggml/src/ggml.c:4642:5,
    inlined from ‘ggml_conv_1d’ at ~/ggml/src/ggml.c:6883:5:
/usr/include/x86_64-linux-gnu/bits/string_fortified.h:29:10: warning: ‘__builtin_memcpy’ offset [0, 11] is out of the bounds [0, 0] [-Warray-bounds]
   29 |   return __builtin___memcpy_chk (__dest, __src, __len,
      |          ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
   30 |                                  __glibc_objsize0 (__dest));
      |                                  ~~~~~~~~~~~~~~~~~~~~~~~~~~
In function ‘memcpy’,
    inlined from ‘ggml_set_op_params’ at ~/ggml/src/ggml.c:4642:5,
    inlined from ‘ggml_conv_2d’ at ~/ggml/src/ggml.c:6923:5:
/usr/include/x86_64-linux-gnu/bits/string_fortified.h:29:10: warning: ‘__builtin_memcpy’ offset [0, 23] is out of the bounds [0, 0] [-Warray-bounds]
   29 |   return __builtin___memcpy_chk (__dest, __src, __len,
      |          ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
   30 |                                  __glibc_objsize0 (__dest));
      |                                  ~~~~~~~~~~~~~~~~~~~~~~~~~~
In function ‘memcpy’,
    inlined from ‘ggml_set_op_params’ at ~/ggml/src/ggml.c:4642:5,
    inlined from ‘ggml_conv_1d’ at ~/ggml/src/ggml.c:6883:5,
    inlined from ‘ggml_conv_1d_ph’ at ~/ggml/src/ggml.c:6942:12:
/usr/include/x86_64-linux-gnu/bits/string_fortified.h:29:10: warning: ‘__builtin_memcpy’ offset [0, 11] is out of the bounds [0, 0] [-Warray-bounds]
   29 |   return __builtin___memcpy_chk (__dest, __src, __len,
      |          ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
   30 |                                  __glibc_objsize0 (__dest));
      |                                  ~~~~~~~~~~~~~~~~~~~~~~~~~~
In function ‘memcpy’,
    inlined from ‘ggml_set_op_params’ at ~/ggml/src/ggml.c:4642:5,
    inlined from ‘ggml_pool_2d’ at ~/ggml/src/ggml.c:7015:5:
/usr/include/x86_64-linux-gnu/bits/string_fortified.h:29:10: warning: ‘__builtin_memcpy’ offset [0, 27] is out of the bounds [0, 0] [-Warray-bounds]
   29 |   return __builtin___memcpy_chk (__dest, __src, __len,
      |          ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
   30 |                                  __glibc_objsize0 (__dest));
      |                                  ~~~~~~~~~~~~~~~~~~~~~~~~~~
In function ‘memcpy’,
    inlined from ‘ggml_set_op_params’ at ~/ggml/src/ggml.c:4642:5,
    inlined from ‘ggml_win_part’ at ~/ggml/src/ggml.c:7183:5:
/usr/include/x86_64-linux-gnu/bits/string_fortified.h:29:10: warning: ‘__builtin_memcpy’ offset [0, 11] is out of the bounds [0, 0] [-Warray-bounds]
   29 |   return __builtin___memcpy_chk (__dest, __src, __len,
      |          ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
   30 |                                  __glibc_objsize0 (__dest));
      |                                  ~~~~~~~~~~~~~~~~~~~~~~~~~~
[ 37%] Linking CXX static library libcommon.a
[ 37%] Built target common
[ 50%] Linking C static library libggml.a
[ 50%] Built target ggml
[ 62%] Building CXX object examples/CMakeFiles/common-ggml.dir/common-ggml.cpp.o
[ 75%] Linking CXX static library libcommon-ggml.a
[ 75%] Built target common-ggml
[ 87%] Building CXX object examples/replit/CMakeFiles/replit.dir/main.cpp.o
[100%] Linking CXX executable ../../bin/replit
[100%] Built target replit
[ 37%] Built target common
[ 50%] Built target ggml
[ 75%] Built target common-ggml
[ 87%] Building CXX object examples/replit/CMakeFiles/replit-quantize.dir/quantize.cpp.o
[100%] Linking CXX executable ../../bin/replit-quantize
[100%] Built target replit-quantize

@klosax

klosax commented 1 year ago

string_fortified.h:29:10: warning: ‘__builtin_memcpy’ offset [0, 11] is out of the bounds [0, 0] [-Warray-bounds]

The model file seems to be fine since the tensor transformer.blocks.0.norm_1.weight is in it. The inference binary should recognize the tensor and load it. My guess it that something is wrong with your compiler since you get warnings that could have to do with the problem. The binary does string comparison sto recognize the tensor names.

Try updating or reinstalling the compiler.

shrikrishnaholla commented 1 year ago

This is my version. Should it be upgraded?

$ gcc -v
Using built-in specs.
COLLECT_GCC=gcc
COLLECT_LTO_WRAPPER=/usr/lib/gcc/x86_64-linux-gnu/11/lto-wrapper
OFFLOAD_TARGET_NAMES=nvptx-none:amdgcn-amdhsa
OFFLOAD_TARGET_DEFAULT=1
Target: x86_64-linux-gnu
Configured with: ../src/configure -v --with-pkgversion='Ubuntu 11.3.0-1ubuntu1~22.04' --with-bugurl=file:///usr/share/doc/gcc-11/README.Bugs --enable-languages=c,ada,c++,go,brig,d,fortran,objc,obj-c++,m2 --prefix=/usr --with-gcc-major-version-only --program-suffix=-11 --program-prefix=x86_64-linux-gnu- --enable-shared --enable-linker-build-id --libexecdir=/usr/lib --without-included-gettext --enable-threads=posix --libdir=/usr/lib --enable-nls --enable-bootstrap --enable-clocale=gnu --enable-libstdcxx-debug --enable-libstdcxx-time=yes --with-default-libstdcxx-abi=new --enable-gnu-unique-object --disable-vtable-verify --enable-plugin --enable-default-pie --with-system-zlib --enable-libphobos-checking=release --with-target-system-zlib=auto --enable-objc-gc=auto --enable-multiarch --disable-werror --enable-cet --with-arch-32=i686 --with-abi=m64 --with-multilib-list=m32,m64,mx32 --enable-multilib --with-tune=generic --enable-offload-targets=nvptx-none=/build/gcc-11-xKiWfi/gcc-11-11.3.0/debian/tmp-nvptx/usr,amdgcn-amdhsa=/build/gcc-11-xKiWfi/gcc-11-11.3.0/debian/tmp-gcn/usr --without-cuda-driver --enable-checking=release --build=x86_64-linux-gnu --host=x86_64-linux-gnu --target=x86_64-linux-gnu --with-build-config=bootstrap-lto-lean --enable-link-serialization=2
Thread model: posix
Supported LTO compression algorithms: zlib zstd
gcc version 11.3.0 (Ubuntu 11.3.0-1ubuntu1~22.04)

klosax commented 1 year ago

I think it should work with your compiler.

But you could try change this line https://github.com/ggerganov/ggml/blob/244776a089ebed7f0332f9c8bdc38d2d40464493/examples/replit/main.cpp#L379 to

if (model.tensors.find(name) == model.tensors.end()) {

and compile again.

shrikrishnaholla commented 1 year ago

This worked! @klosax , thanks for your time and for the help. I was lost without you :pray:

Would this change be useful to others as well? Should I commit and raise a PR?

klosax commented 1 year ago

Great!

Then all references of name.data() should be changed to name, and in lines with fprintf or printf it should be changed to name.c_str()

Would this change be useful to others as well? Should I commit and raise a PR?

It looks like this error can also be found in other examples and all of them should be fixed.

shrikrishnaholla commented 1 year ago

Wouldn't that be breaking compilation of other models as well? Would you like me to try and reproduce for other classes of models before making a fix?

Because if what you say is true, then wouldn't this be a huge change? :thinking:

klosax commented 1 year ago

name is a std::string and should be accessed as such, the contents should not be accessed directly by data() like it is done here.

All examples compile and works fine for me using gcc 9, so my guess is that your gcc 11 is handling this different than the older compilers, and that is the reason it wont work for you.

shrikrishnaholla commented 1 year ago

Understood. So if I'm understanding correctly, even if name is accessed directly, since it is an std::string it won't break for older compilers like the one you use, correct?

Apologies for asking what might be basic questions. My C++ is rusty, so I don't want to be creating a regression and getting angry emails :sweat_smile:

klosax commented 1 year ago

Yes the changes wont break anything for older compilers. I will make a PR for this to change all examples.

klosax commented 1 year ago

Would you like me to try and reproduce for other classes of models before making a fix?

If you like you could test one other example to see if the same error is there and if it is fixed by this change.

ggerganov / ggml

Cannot load teknium/Replit-v2-CodeInstruct-3B #436