-
### 🐛 Describe the bug
Hi,
We use `torch.compile` to run GPTJ3.6B model training on our GPU platforms, while we got some dynamo errors and the process aborted. The error is happening when runnin…
-
### Terraform Core Version
1.6.5
### AWS Provider Version
5.31.0
### Affected Resource(s)
Sagemaker Engpoint config.
### Expected Behavior
When creating a jumpstart endpoint through the SageMak…
-
Very nice work! I am trying to replicate the results of longllmlingua on a Natural Questions dataset, but there may be some discrepancies between the results and those in the paper due to unclear valu…
-
## Formal verification
1. [Modular, Compositional, and Executable Formal Semantics for LLVM IR](https://dl.acm.org/doi/pdf/10.1145/3473572)
2. [Alive2: Bounded Translation Validation for LLVM](https…
-
Hello :)
Thank you for the excellent work and for sharing your code. I've learned a lot and have a few questions about the paper and settings:
- In Figures 2 and 3, what specifically do "prompt" …
-
Hi,
I'm doing tests when using chat history (ConversationBufferWindowMemory) + local data retrieval + llm (Baichuan2-13B-Chat) to get answer.
I have two tests. Note that, all questions are rel…
-
### Your current environment
```text
PyTorch version: 2.3.1+cu121
Is debug build: False
CUDA used to build PyTorch: 12.1
ROCM used to build PyTorch: N/A
OS: Ubuntu 22.04.3 LTS (x86_64)
GCC ve…
-
**Is your feature request related to a problem? Please describe.**
I would like to ask for RMM support of CUDA compressed memory, a feature available in the A100 and H100 for both DRAM and the L2 c…
-
## Description
I tried to quote the following documents directly,tools/pytorch-quantization/pytorch_quantization/calib/histogram.py,and Use HistogramCalibrator.compute_amax() to calculate the max…
-
I am interested in loading Long Llama with Mojo Framework as mentioned here https://github.com/tairov/llama2.mojo to increase the model speed while applying 4-bit quantization for model compression. C…