Alenush / rugpt3simplification_rsse

6 stars 4 forks source link

Missing keys in state_dict #2

Open kuraga opened 3 years ago

kuraga commented 3 years ago

Hello, @Alenush !

According to error, some weights are missing in .pth. (Missing keys in state_dict: ...) Could model change?

Thanks!

kuraga commented 3 years ago

Well applying https://github.com/sberbank-ai/ru-gpts/issues/42#issuecomment-839830391 let me load the model.

But on model apply I get https://github.com/sberbank-ai/ru-gpts/issues/42#issue-813829451 itself:

<...>

/usr/local/lib/python3.8/dist-packages/deepspeed/ops/sparse_attention/matmul.py in _sdd_matmul(a, b, trans_a, trans_b, trans_c, spdims, block, luts, num_locks, widths, packs, bench, time)                                                   
    202             total = 0 if bench else None                                                                                                                                                                                              
    203             for off_width in range(0, width, max_width):                                                                                                                                                                              
--> 204                 current = kernel(                                                                                                                                                                                                     
    205                     a.data_ptr(),                                                                                                                                                                                                     
    206                     b.data_ptr(),                                                                                                                                                                                                     

/usr/local/lib/python3.8/dist-packages/triton/kernel.py in __call__(self, grid, *args)                                                                                                                                                        
    114         # pack parameters into a byte buffer                                                                                                                                                                                          
    115         params = struct.pack(self.tys, *args)                                                                                                                                                                                         
--> 116         kernel = self.fn.autotune(params, grid, self.stream)                                                                                                                                                                          
    117         # run kernel
    118         grid = grid(kernel.opt)

RuntimeError: CUDA: Error- invalid ptx
Alenush commented 2 years ago

Hi!

My only guess that during uploading of the checkpoint smth went wrong. I uploaded again checkpoint in another source. Please, try here: https://sc.link/Dk6Y

kuraga commented 2 years ago

I'm facing https://github.com/sberbank-ai/ru-gpts/issues/68. After applying https://github.com/sberbank-ai/ru-gpts/issues/68#issuecomment-894067337 error dissapears but I'm facing out-of-memory error.

Thanks!