-
### Search before asking
- [X] I searched in the [issues](https://github.com/apache/paimon/issues) and found nothing similar.
### Paimon version
0.9.0
### Compute Engine
Flink
### Minimal repro…
-
Hello:
I noticed that physical page would always be in Channel 0 and Channel 1 no matter how many channels I actually configure, because the CH_BITS is set to 1. And I think this would hamper paralle…
-
Just wonder does the current PipelineStage API supports variable length input shapes like in Megatron? https://github.com/NVIDIA/Megatron-LM/blob/e33c8f78a35765d5aa37475a144da60e8a2349d1/megatron/core…
-
When configuring pipeline splitting by specifying exact layers in the config (`--experimental.pipeline_parallel_split_points`), we are unable to assign sub-layers (e.g., `layer.4.attn.qvw`). If we att…
-
### 🚀 The feature, motivation and pitch
As we can see, Google Gemini can support up to million tokens and to serve longer context length, we have to do context parallelism, which means, split the i…
-
We observed good overlap with FSDP + PGLE:
![Bq7PCuqyJbygSuL](https://github.com/user-attachments/assets/0cff27c4-6499-43d0-b436-ef01a2833ae0). Turning on and off PGLE makes a big difference here.
…
-
## Build Information
Build: https://dev.azure.com/dnceng-public/cbb18261-c48f-4abb-8651-8cdcb5474649/_build/results?buildId=869444
Build error leg or test failing: Microsoft.DotNet.Cli.New.Integration…
-
### Background
In distributed training scenarios, RNG initialization matters for ensuring a correct model initialization, and in some cases also controlling random ops during training (e.g. dropout)…
-
Recently we landed https://github.com/pytorch/ao/pull/939 to support tensor parallelism for int8 weight only quantization, another example: https://github.com/pytorch/ao/pull/785
now we can support…
-
Currently it seems like both Megatron SP and DeepSpeed SP are not correctly implemented in Megatron-DeepSpeed. Maybe this was working once but since new features have been added there are conflicts be…