-
We are using this library to boostrap the clients, it would be nice to use modern versions of its dependencies.
Upgrading `rustls` is more difficult than I would have hoped.
-
### š Describe the bug
Same data all reduce on H20 and tp8, but the results are different.
The problem can be reproduced on the imageļ¼nvcr.io/nvidia/pytorch:24.09-py3
demo codeļ¼
``` python
ā¦
-
### Bug summary
version 1 with semaphore
```
import asyncio
from prefect import task, flow
sem = asyncio.Semaphore(100)
@task
async def print_value(value):
async with sem:
aā¦
-
**Describe the bug**
Occurs only when due to some reason cluster restarted. Below error noticed whenever cluster restarted
ERROR com.netflix.conductor.locking.redis.RedisLockĀ Ā - Failed to acquireLā¦
-
## Problem
We are facing challenges with the current database migration strategy for Fleet device management in a complex deployment environment. Our infrastructure requires that services remain onā¦
-
Is there any guidance on the optimal [EFS settings](https://docs.aws.amazon.com/AWSCloudFormation/latest/UserGuide/aws-resource-efs-filesystem.html) to maximize throughput with LanceDB? In particular:ā¦
-
The Lock abstraction is inconsistent across distributed backends mostly because etcd expects the key not to exist for the mechanism to work.
This is due to the custom Lock implementation using atomicā¦
-
env:
win11 + wsl2 + ubuntu22.04
log:
2024-07-14 10:15:10,023 DEBUG TRAIN Batch 0/49100 loss 1.675823 acc 0.226168 lr 0.00031911 grad_norm 3.311060 rank 0
WARNING:torch.distributed.elastic.rendā¦
-
I have a ONAP use-case to block/wait for the dependent messages processing between load balance queue group consumers.
I am trying to build correlation messaging system, with complex business logicā¦
ghost updated
4 years ago
-
### š The feature, motivation and pitch
Currently, distributed inference (TP) in vLLM relies on ray to orchestrate the gpu workers. I briefly check the code and seems the core distributed communicaā¦