virtual-batch-size Search Results

NVIDIA/bionemo-framework #286

Support virtual pipeline parallel in global batch size calcu…

jstjohn updated 1 month ago

NVIDIA/NeMo #11360

Drastic difference between .nemo and HF checkpoint

**Describe the bug** I have trained a llama-like model with nemo using the below model config: ``` model: mcore_gpt: True micro_batch_size: 1 global_batch_size: 512 tensor_model_parallel_size…

rahul-sarvam updated 10 hours ago

InternLM/lmdeploy #2691

api_server 方式部署有概率卡住

### Checklist - [X] 1. I have searched related issues but cannot get the expected help. - [ ] 2. The bug has not been fixed in the latest version. - [ ] 3. Please note that if the bug-related issue y…

LiYtao updated 1 day ago

openpsi-project/ReaLHF #80

Suggestion for Fine-Grained Batch Control e.g `per_device_tr…

Hello there, First, I'd like to express my appreciation for your excellent work on this project. While experimenting with PPO/RW using this repository, I consistently encounter Out of Memory (OOM) e…

dechunwang updated 1 month ago

NVIDIA/TensorRT-LLM #2422

attempt to run benchmark with batch_size>=512 and input_outp…

System config: - CPU arch x86_64 - GPU: H200 - Tensorrt-LLM:v0.14.0 - OS: ubuntu-22.04 - runtime-env: docker container build from sources via official [build script](https://techcommunity.microsoft.c…

dmonakhov updated 1 week ago

SalesforceAIResearch/uni2ts #144

TypeError: MoiraiMoEModule.init() missing 8 required pos…

**Describe the bug** When I build a python demo name testmoe.py with the "get started codes example " in the src directory, the terminal gives the following error like this: "TypeError: MoiraiMoEM…

Liuwenjing985 updated 5 hours ago

triton-lang/triton #3864

Segmentation fault when DataLoader processes are launched af…

I am working on a project that involves restructuring a network over different phases of training. Key aspects of this involves calls to custom Triton code, which is compiled and autotuned on the fly …

leademeule updated 4 weeks ago

WuNein/vllm4mteb #5

这个方法有个bug,无法支持stella_en_v1.5B_v5的模型

虽然也是qwen2的架构，但是无法支持 ``` TypeError Traceback (most recent call last) Cell In[41], line 12 2 prompts = [ 3 "Hello, my name is", 4 "The pre…

cavities updated 2 weeks ago

pytorch/pytorch #140712

No fake impl or Meta kernel for Communication Operator

### 🐛 Describe the bug There is no fake implementation or meta kernel for the Communication Operator. If I want to contribute to this feature, what can I do? Are there any examples that I can refer…

jo-pillar updated 6 hours ago

usnistgov/alignn #169

'cif2cell' is not recognized as an internal or external comm…

Hi! I was studying Geometric GNN and trying to use your network to predict one paticular property. My dataset was in **cif** form. However when i added `--file_format cif` to make the network predict …

Dalabomba updated 1 week ago

1000+ results for virtual-batch-size

1000+ results
for virtual-batch-size