-
-
**Describe the bug**
I have trained a llama-like model with nemo using the below model config:
```
model:
mcore_gpt: True
micro_batch_size: 1
global_batch_size: 512
tensor_model_parallel_size…
-
### Checklist
- [X] 1. I have searched related issues but cannot get the expected help.
- [ ] 2. The bug has not been fixed in the latest version.
- [ ] 3. Please note that if the bug-related issue y…
-
Hello there,
First, I'd like to express my appreciation for your excellent work on this project.
While experimenting with PPO/RW using this repository, I consistently encounter Out of Memory (OOM) e…
-
System config:
- CPU arch x86_64
- GPU: H200
- Tensorrt-LLM:v0.14.0
- OS: ubuntu-22.04
- runtime-env: docker container build from sources via official [build script](https://techcommunity.microsoft.c…
-
**Describe the bug**
When I build a python demo name testmoe.py with the "get started codes example " in the src directory, the terminal gives the following error like this:
"TypeError: MoiraiMoEM…
-
I am working on a project that involves restructuring a network over different phases of training. Key aspects of this involves calls to custom Triton code, which is compiled and autotuned on the fly …
-
虽然也是qwen2的架构,但是无法支持
```
TypeError Traceback (most recent call last)
Cell In[41], line 12
2 prompts = [
3 "Hello, my name is",
4 "The pre…
-
### 🐛 Describe the bug
There is no fake implementation or meta kernel for the Communication Operator. If I want to contribute to this feature, what can I do? Are there any examples that I can refer…
-
Hi! I was studying Geometric GNN and trying to use your network to predict one paticular property. My dataset was in **cif** form. However when i added `--file_format cif` to make the network predict …