llm-compression Search Results

508 results
for llm-compression

Best match

Best match Most commented Newest Recently updated Least commented Oldest Least recently updated

pytorch/pytorch #116766

[Dynamo][DeepSpeed] torch._dynamo.exc.InternalTorchDynamoErr…

### 🐛 Describe the bug Hi, We use `torch.compile` to run GPTJ3.6B model training on our GPU platforms, while we got some dynamo errors and the process aborted. The error is happening when runnin…

zejun-chen updated 23 hours ago
11
hashicorp/terraform-provider-aws #35011

[Enhancement]: one to one mapping with sagemaker jumpstart m…

### Terraform Core Version 1.6.5 ### AWS Provider Version 5.31.0 ### Affected Resource(s) Sagemaker Engpoint config. ### Expected Behavior When creating a jumpstart endpoint through the SageMak…

Bryson14 updated 1 month ago
6
microsoft/LLMLingua #22

The specific parameter settings in the compressor for reprod…

Very nice work! I am trying to replicate the results of longllmlingua on a Natural Questions dataset, but there may be some discrepancies between the results and those in the paper due to unclear valu…

ignorejjj updated 6 months ago
4
sun-jacobi/notes #3

Paper reading

## Formal verification 1. [Modular, Compositional, and Executable Formal Semantics for LLVM IR](https://dl.acm.org/doi/pdf/10.1145/3473572) 2. [Alive2: Bounded Translation Validation for LLVM](https…

sun-jacobi updated 3 months ago
2
FasterDecoding/SnapKV #1

Questions on paper and code [prompting for mistral, position…

Hello :) Thank you for the excellent work and for sharing your code. I've learned a lot and have a few questions about the paper and settings: - In Figures 2 and 3, what specifically do "prompt" …

MarsJacobs updated 2 months ago
8
PromtEngineer/localGPT #635

Adding chat history gives wrong answer

Hi, I'm doing tests when using chat history (ConversationBufferWindowMemory) + local data retrieval + llm (Baichuan2-13B-Chat) to get answer. I have two tests. Note that, all questions are rel…

pvtoan updated 8 months ago
2
vllm-project/vllm #6889

[Bug]: "apply_gptq_marlin_linear" Error When TP > 1

### Your current environment ```text PyTorch version: 2.3.1+cu121 Is debug build: False CUDA used to build PyTorch: 12.1 ROCM used to build PyTorch: N/A OS: Ubuntu 22.04.3 LTS (x86_64) GCC ve…

sitabulaixizawaluduo updated 10 hours ago
4
rapidsai/rmm #1265

[FEA] Add support for CUDA compressed memory.

**Is your feature request related to a problem? Please describe.** I would like to ask for RMM support of CUDA compressed memory, a feature available in the A100 and H100 for both DRAM and the L2 c…

veritas9872 updated 1 year ago
1
NVIDIA/TensorRT #4023

The KL divergence calculation is very slow and is not optimi…

## Description I tried to quote the following documents directly，tools/pytorch-quantization/pytorch_quantization/calib/histogram.py，and Use HistogramCalibrator.compute_amax() to calculate the max…

yychen2000 updated 5 days ago
3
CStanKonrad/long_llama #17

utilizing Long Llama with Mojo Framework and applying 4-bit …

I am interested in loading Long Llama with Mojo Framework as mentioned here https://github.com/tairov/llama2.mojo to increase the model speed while applying 4-bit quantization for model compression. C…

myname36 updated 9 months ago
1

上一页 1...3 4 5 6 7 8 9...51 下一页

508 results for llm-compression

508 results
for llm-compression