-
-
**Describe the bug**
I have trained a llama-like model with nemo using the below model config:
```
model:
mcore_gpt: True
micro_batch_size: 1
global_batch_size: 512
tensor_model_parallel_size…
-
### Checklist
- [X] 1. I have searched related issues but cannot get the expected help.
- [ ] 2. The bug has not been fixed in the latest version.
- [ ] 3. Please note that if the bug-related issue y…
-
Hello there,
First, I'd like to express my appreciation for your excellent work on this project.
While experimenting with PPO/RW using this repository, I consistently encounter Out of Memory (OOM) e…
-
System config:
- CPU arch x86_64
- GPU: H200
- Tensorrt-LLM:v0.14.0
- OS: ubuntu-22.04
- runtime-env: docker container build from sources via official [build script](https://techcommunity.microsoft.c…
-
**Describe the bug**
When I build a python demo name testmoe.py with the "get started codes example " in the src directory, the terminal gives the following error like this:
"TypeError: MoiraiMoEM…
-
虽然也是qwen2的架构,但是无法支持
```
TypeError Traceback (most recent call last)
Cell In[41], line 12
2 prompts = [
3 "Hello, my name is",
4 "The pre…
-
I am working on a project that involves restructuring a network over different phases of training. Key aspects of this involves calls to custom Triton code, which is compiled and autotuned on the fly …
-
## Description
Installation from a repository with LFS content fails with a smudge error, e.g.
```bash
uv venv -p 3.11
uv pip install "git+https://github.com/makarovdi/uplot.git@main"
```
the …
-
> Please provide us with the following information:
> ---------------------------------------------------------------
### This issue is for a: (mark with an `x`)
```
- [X] bug report -> please…