Open amansahu278 opened 4 months ago
Same problem here.
Done with layer 60
Done with layer 61
Done with layer 62
Done with layer 63
Done with layer 0
Traceback (most recent call last):
File "/data/alphax/CacheGen/run_encoding.py", line 132, in <module>
encode_function(
File "/data/alphax/CacheGen/run_encoding.py", line 119, in encode_function
encode_input[l:l+1, i].to(torch.int16) )
IndexError: index 2524 is out of bounds for dimension 1 with size 2524
Hi @amansahu278 and @leomem, to run encoding, you need to make sure the value passed to "--chunk_size" * the value passed to "--num_chunks" is smaller than the total number of tokens in your context/KV cache.
For example, in Aman's case, please pass in "--num_chunks 3" since your context is 6478 (if i understand correctly).
Thanks!
Thanks @YuhanLiu11 . I changed chunk_size to 500 to make it work. However, I got into another problem when running run_decoding_disk.py. I can't find the expected model config file for Mistral-7B-v0.2, So I created one based on the huggingface model file. I think the channels should be 4096.
{
"layers": 32,
"channels": 4096
}
It got the error,
Traceback (most recent call last):
File "/data/alphax/CacheGen/run_decoding_disk.py", line 114, in <module>
decoded = decode_function(inference_cdf,
File "/data/alphax/CacheGen/src/decode_interface.py", line 88, in decode_function
out = output.reshape((2, max_tensors_k.shape[0], CHUNK_SIZE, 1024))
RuntimeError: shape '[2, 32, 500, 1024]' is invalid for input of size 131072000
If "channels" in the model config file is changed to 1024 or if 1024 is replaced with 4096 in decode_function, I got the following errors
/data/alphax/CacheGen/src/decode_interface.py:78: UserWarning: To copy construct from a tensor, it is recommended to use sourceTensor.clone().detach() or sourceTensor.clone().detach().requires_grad_(True), rather than torch.tensor(sourceTensor).
start_indices = torch.tensor(start_indices).int().cuda()
kernel computation time: 0.04337750934064388
Traceback (most recent call last):
File "/data/alphax/CacheGen/run_decoding_disk.py", line 112, in <module>
decoded = decode_function(inference_cdf,
File "/data/alphax/CacheGen/src/decode_interface.py", line 90, in decode_function
key = out[0].half()
RuntimeError: CUDA error: an illegal memory access was encountered
CUDA kernel errors might be asynchronously reported at some other API call, so the stacktrace below might be incorrect.
For debugging consider passing CUDA_LAUNCH_BLOCKING=1.
Compile with `TORCH_USE_CUDA_DSA` to enable device-side assertions.
After adding "CUDA_LAUNCH_BLOCKING=1", the program coredumped
(cachegenenv) CacheGen$ CUDA_LAUNCH_BLOCKING=1 python run_decoding_disk.py --model_config config/mistral_7b.json --path_to_encoded_kv encoded --num_chunks 4 --quantization_config config/quantization_7b.json --model_id mistral-community/Mistral-7B-v0.2 --path_to_context 3k_prompts/0.txt --chunk_size 500
Loading time: 0.010831324383616447
start
/data/alphax/miniconda3/envs/cachegenenv/lib/python3.10/site-packages/huggingface_hub/file_download.py:1132: FutureWarning: `resume_download` is deprecated and will be removed in version 1.0.0. Downloads always resume when possible. If you want to force a new download, use `force_download=True`.
warnings.warn(
/data/alphax/miniconda3/envs/cachegenenv/lib/python3.10/site-packages/transformers/modeling_utils.py:4481: FutureWarning: `_is_quantized_training_enabled` is going to be deprecated in transformers 4.39.0. Please use `model.hf_quantizer.is_trainable` instead
warnings.warn(
Fetching 12 files: 100%|█████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 12/12 [00:00<00:00, 52483.47it/s]
100%|█████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 3/3 [00:37<00:00, 12.56s/it]
/data/alphax/CacheGen/src/decode_interface.py:78: UserWarning: To copy construct from a tensor, it is recommended to use sourceTensor.clone().detach() or sourceTensor.clone().detach().requires_grad_(True), rather than torch.tensor(sourceTensor).
start_indices = torch.tensor(start_indices).int().cuda()
Segmentation fault (core dumped)
I also ran into the first issue, when the model configuration was "layers": 32, and "channels":4096 (choosing so from previous committed versions). I also get the same error of shape mismatch.
Hello, While working from the main branch in my fork, I first generate the KV Cache and then I run the encoding where in I get the error
Here is the relevant output (which i consider is relevant)
The following are the commands and configurations i used For generating the KV cache
python $GENCACHE_PATH/main.py \ --save_dir $GENCACHE_DATA_PATH/kvcache \ --path_to_context $GENCACHE_PATH/7k_prompts/1.txt \ && echo "KV Cache generated"
For Encoding
python $GENCACHE_PATH/run_encoding.py \ --output_path $GENCACHE_DATA_PATH/encoded \ --path_to_kv $GENCACHE_DATA_PATH/kvcache/test_kv_0.pkl \ --quantization_config $GENCACHE_PATH/config/quantization_7b.json \ && echo "Encoding done"