IndexError in run_encoding.py

amansahu278 commented 4 months ago

Hello, While working from the main branch in my fork, I first generate the KV Cache and then I run the encoding where in I get the error

Traceback (most recent call last): File "/home/am_sahu/./CacheGenExp/run_encoding.py", line 132, in encode_function( File "/home/am_sahu/./CacheGenExp/run_encoding.py", line 119, in encode_function encode_input[l:l+1, i].to(torch.int16) ) IndexError: index 6478 is out of bounds for dimension 1 with size 6478

Here is the relevant output (which i consider is relevant)

Loading extension module torchac_backend... You are using the default legacy behaviour of the <class 'transformers.models.llama.tokenization_llama.LlamaTokenizer'>. This is expected, and simply means that the legacy (previous) behavior will be used so nothing changes for you. If you want to use the new behaviour, set legacy=False. This should only be set if you understand what it means, and thoroughly read the reason why this was added as explained in https://github.com/huggingface/transformers/pull/24565 100%|██████████| 2/2 [00:51<00:00, 25.93s/it] Model and tokenizer loaded Length of input: torch.Size([1, 6479]) TTFT: 1.7883199411444366 KV Cache generated Using /home/am_sahu/.cache/torch_extensions/py310_cu121 as PyTorch extensions root... Emitting ninja build file /home/am_sahu/.cache/torch_extensions/py310_cu121/torchac_backend/build.ninja... Building extension module torchac_backend... Allowing ninja to set a default number of workers... (overridable by setting the environment variable MAX_JOBS=N) ninja: no work to do. Loading extension module torchac_backend... 75%|███████▌ | 3/4 [04:33<01:31, 91.03s/it] Done with layer 0 Done with layer 1 Done with layer 2 Done with layer 3 Done with layer 4 Done with layer 5 Done with layer 6 Done with layer 7 Done with layer 8 Done with layer 9 Done with layer 10 Done with layer 11 Done with layer 12 Done with layer 13 Done with layer 14 Done with layer 15 Done with layer 16 Done with layer 17 Done with layer 18 Done with layer 19 Done with layer 20 Done with layer 21 Done with layer 22 Done with layer 23 Done with layer 24 Done with layer 25 Done with layer 26 Done with layer 27 Done with layer 28 Done with layer 29 Done with layer 30 Done with layer 31 Done with layer 32 Done with layer 33 Done with layer 34 Done with layer 35 Done with layer 36 Done with layer 37 Done with layer 38 Done with layer 39 Done with layer 40 Done with layer 41 Done with layer 42 Done with layer 43 Done with layer 44 Done with layer 45 Done with layer 46 Done with layer 47 Done with layer 48 Done with layer 49 Done with layer 50 Done with layer 51 Done with layer 52 Done with layer 53 Done with layer 54 Done with layer 55 Done with layer 56 Done with layer 57 Done with layer 58 Done with layer 59 Done with layer 60 Done with layer 61 Done with layer 62 Done with layer 63 Done with layer 0 Done with layer 1 Done with layer 2 Done with layer 3 Done with layer 4 Done with layer 5 Done with layer 6 Done with layer 7 Done with layer 8 Done with layer 9 Done with layer 10 Done with layer 11 Done with layer 12 Done with layer 13 Done with layer 14 Done with layer 15 Done with layer 16 Done with layer 17 Done with layer 18 Done with layer 19 Done with layer 20 Done with layer 21 Done with layer 22 Done with layer 23 Done with layer 24 Done with layer 25 Done with layer 26 Done with layer 27 Done with layer 28 Done with layer 29 Done with layer 30 Done with layer 31 Done with layer 32 Done with layer 33 Done with layer 34 Done with layer 35 Done with layer 36 Done with layer 37 Done with layer 38 Done with layer 39 Done with layer 40 Done with layer 41 Done with layer 42 Done with layer 43 Done with layer 44 Done with layer 45 Done with layer 46 Done with layer 47 Done with layer 48 Done with layer 49 Done with layer 50 Done with layer 51 Done with layer 52 Done with layer 53 Done with layer 54 Done with layer 55 Done with layer 56 Done with layer 57 Done with layer 58 Done with layer 59 Done with layer 60 Done with layer 61 Done with layer 62 Done with layer 63 Done with layer 0 Done with layer 1 Done with layer 2 Done with layer 3 Done with layer 4 Done with layer 5 Done with layer 6 Done with layer 7 Done with layer 8 Done with layer 9 Done with layer 10 Done with layer 11 Done with layer 12 Done with layer 13 Done with layer 14 Done with layer 15 Done with layer 16 Done with layer 17 Done with layer 18 Done with layer 19 Done with layer 20 Done with layer 21 Done with layer 22 Done with layer 23 Done with layer 24 Done with layer 25 Done with layer 26 Done with layer 27 Done with layer 28 Done with layer 29 Done with layer 30 Done with layer 31 Done with layer 32 Done with layer 33 Done with layer 34 Done with layer 35 Done with layer 36 Done with layer 37 Done with layer 38 Done with layer 39 Done with layer 40 Done with layer 41 Done with layer 42 Done with layer 43 Done with layer 44 Done with layer 45 Done with layer 46 Done with layer 47 Done with layer 48 Done with layer 49 Done with layer 50 Done with layer 51 Done with layer 52 Done with layer 53 Done with layer 54 Done with layer 55 Done with layer 56 Done with layer 57 Done with layer 58 Done with layer 59 Done with layer 60 Done with layer 61 Done with layer 62 Done with layer 63 Done with layer 0 Traceback (most recent call last): File "/home/am_sahu/./CacheGenExp/run_encoding.py", line 132, in encode_function( File "/home/am_sahu/./CacheGenExp/run_encoding.py", line 119, in encode_function encode_input[l:l+1, i].to(torch.int16) ) IndexError: index 6478 is out of bounds for dimension 1 with size 6478

The following are the commands and configurations i used For generating the KV cache python $GENCACHE_PATH/main.py \ --save_dir $GENCACHE_DATA_PATH/kvcache \ --path_to_context $GENCACHE_PATH/7k_prompts/1.txt \ && echo "KV Cache generated"

For Encoding python $GENCACHE_PATH/run_encoding.py \ --output_path $GENCACHE_DATA_PATH/encoded \ --path_to_kv $GENCACHE_DATA_PATH/kvcache/test_kv_0.pkl \ --quantization_config $GENCACHE_PATH/config/quantization_7b.json \ && echo "Encoding done"

leomem commented 4 months ago

Same problem here.

Done with layer 60
Done with layer 61
Done with layer 62
Done with layer 63
Done with layer 0
Traceback (most recent call last):
  File "/data/alphax/CacheGen/run_encoding.py", line 132, in <module>
    encode_function(
  File "/data/alphax/CacheGen/run_encoding.py", line 119, in encode_function
    encode_input[l:l+1, i].to(torch.int16) )
IndexError: index 2524 is out of bounds for dimension 1 with size 2524

YuhanLiu11 commented 4 months ago

Hi @amansahu278 and @leomem, to run encoding, you need to make sure the value passed to "--chunk_size" * the value passed to "--num_chunks" is smaller than the total number of tokens in your context/KV cache.

For example, in Aman's case, please pass in "--num_chunks 3" since your context is 6478 (if i understand correctly).

Thanks!

leomem commented 4 months ago

Thanks @YuhanLiu11 . I changed chunk_size to 500 to make it work. However, I got into another problem when running run_decoding_disk.py. I can't find the expected model config file for Mistral-7B-v0.2, So I created one based on the huggingface model file. I think the channels should be 4096.

{
  "layers": 32,
  "channels":  4096
}

It got the error,

Traceback (most recent call last):
  File "/data/alphax/CacheGen/run_decoding_disk.py", line 114, in <module>
    decoded  = decode_function(inference_cdf,
  File "/data/alphax/CacheGen/src/decode_interface.py", line 88, in decode_function
    out = output.reshape((2, max_tensors_k.shape[0], CHUNK_SIZE, 1024))
RuntimeError: shape '[2, 32, 500, 1024]' is invalid for input of size 131072000

If "channels" in the model config file is changed to 1024 or if 1024 is replaced with 4096 in decode_function, I got the following errors

/data/alphax/CacheGen/src/decode_interface.py:78: UserWarning: To copy construct from a tensor, it is recommended to use sourceTensor.clone().detach() or sourceTensor.clone().detach().requires_grad_(True), rather than torch.tensor(sourceTensor).
  start_indices = torch.tensor(start_indices).int().cuda()
kernel computation time:  0.04337750934064388
Traceback (most recent call last):
  File "/data/alphax/CacheGen/run_decoding_disk.py", line 112, in <module>
    decoded  = decode_function(inference_cdf,
  File "/data/alphax/CacheGen/src/decode_interface.py", line 90, in decode_function
    key = out[0].half()
RuntimeError: CUDA error: an illegal memory access was encountered
CUDA kernel errors might be asynchronously reported at some other API call, so the stacktrace below might be incorrect.
For debugging consider passing CUDA_LAUNCH_BLOCKING=1.
Compile with `TORCH_USE_CUDA_DSA` to enable device-side assertions.

After adding "CUDA_LAUNCH_BLOCKING=1", the program coredumped

(cachegenenv) CacheGen$ CUDA_LAUNCH_BLOCKING=1 python run_decoding_disk.py --model_config config/mistral_7b.json --path_to_encoded_kv encoded --num_chunks 4 --quantization_config config/quantization_7b.json --model_id mistral-community/Mistral-7B-v0.2 --path_to_context 3k_prompts/0.txt  --chunk_size 500
Loading time:  0.010831324383616447
start
/data/alphax/miniconda3/envs/cachegenenv/lib/python3.10/site-packages/huggingface_hub/file_download.py:1132: FutureWarning: `resume_download` is deprecated and will be removed in version 1.0.0. Downloads always resume when possible. If you want to force a new download, use `force_download=True`.
  warnings.warn(
/data/alphax/miniconda3/envs/cachegenenv/lib/python3.10/site-packages/transformers/modeling_utils.py:4481: FutureWarning: `_is_quantized_training_enabled` is going to be deprecated in transformers 4.39.0. Please use `model.hf_quantizer.is_trainable` instead
  warnings.warn(
Fetching 12 files: 100%|█████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 12/12 [00:00<00:00, 52483.47it/s]
100%|█████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 3/3 [00:37<00:00, 12.56s/it]
/data/alphax/CacheGen/src/decode_interface.py:78: UserWarning: To copy construct from a tensor, it is recommended to use sourceTensor.clone().detach() or sourceTensor.clone().detach().requires_grad_(True), rather than torch.tensor(sourceTensor).
  start_indices = torch.tensor(start_indices).int().cuda()
Segmentation fault (core dumped)

amansahu278 commented 4 months ago

I also ran into the first issue, when the model configuration was "layers": 32, and "channels":4096 (choosing so from previous committed versions). I also get the same error of shape mismatch.

UChi-JCL / CacheGen

IndexError in run_encoding.py #2