-
Table for locks, LWT to acquire lock, heartbeat lock, table is taskdb, option to create new taskdb once end of list reached or just keep looping
Configurable consistency level on write, read, and a…
-
By default, Spark's streaming capabilities follow a "micro-batching" model, where data is collected into a batch for a window of time. At the end of that window, a batch job is launched on the cluster…
-
I have some thoughts about using vLLM for generation. Feel free to correct me if I were wrong.
1. Batching
It seems that prompts are still passing to vllm engines in micro rollout batches during `ma…
-
**Why we need window functions?**
When working with big data window functions help to slice the things out like removing the duplicated with rank/row number/dense rank without theses inbuilt function…
-
Hi Team, Does it make sense to support `minimumBatchSize` with `maxTimeToWait` in Parallel consumer? The idea is that I want to use the PC as a micro batch consumer, but since my consumers are faster …
-
use-case: near-real time sync from MSSQL db to s3.
I know I can do a while loop calling spark-submit each time but this will be slow as JVM needs to startup each time. Is there a way to make metorikk…
-
Hi,
I want to know whether I could use pippy's pp capability with deepspeed's zero3 config? So that it together lead to 3d parallism?
Thx
-
Retraining on checkpoint works perfectly with the tokenization on the fly, but breaks while using nanoset: training restart with a different lr, which is not the same as lr_schedule.pt
We also have…
-
I tested llama3 continue training with multi-machine tp4 pp2 dp2. If I enabled grad accum operation, the training would hang. The experimental environment is: 16H800 torch 2.1.2+cu121.
checkpoints:…
-
**Is your feature request related to a problem? Please describe.**
I am consuming data from Pulsar through Spark Structure Streaming in micro-batches.
Right now, what happens is that spark consume…