distributed-work Search Results

1000+ results
for distributed-work

Best match

Best match Most commented Newest Recently updated Least commented Oldest Least recently updated

paritytech/polkadot-sdk #4139

Distributed validator infrastructure for Polkadot

I'm starting a bit of research and looking for advice/insight into expanding this issue into a full-featured spec for creating a distributed polkadot validator cluster. This would be similar to what […

drewstone updated 7 months ago
2
priestjim/gen_rpc #72

Custom cookie handling / authentication

I would like to be able to specify the cookie used when connecting to a certain remote node. My use case: I have a LAN setup with two separate, distributed applications running. I am currently usin…

meyercm updated 5 years ago
1
grafana/grafana #73709

Centralize information about how to make use of Tempo for di…

**Why is this needed**: Community questions are becoming more frequent about how to make use of Tempo in their distributed tracing system, and specifically the integrations that exist within Grafa…

zalegrala updated 1 week ago
5
dask/distributed #3341

dask-distributed client socks proxies

My scheduler is not reachable by the public web, I actually use a SOCKS5 proxy to reach it. The reason is, i'm limited by the number of public IPs I can have at one time. To perform my task, I'm using…

deepio updated 4 years ago
14
pytorch/pytorch #140563

NCCL hangs cause timeout

### 🐛 Describe the bug I'm training a vqgan model and there is a forward operation which do allreduce across batch to get an estimation of the data distribution. It successfully ran for hours and han…

Jason3900 updated 3 hours ago
7
dask/dask-drmaa #27

Start cluster over ssh

I work on a server with a Jupyterhub and have access to a pbs cluster, both machines have the same Python environments. Right now I do the following (manual work): 1. I start workers on the cluste…

basnijholt updated 6 years ago
3
pytorch/pytorch #58005

torch.distributed.nn.all_reduce incorrectly scales the gradi…

## 🐛 Bug `torch.distributed.nn.all_reduce` computes different gradient values from `torch.distributed.all_reduce`. In particular, it seems to scale the gradients by `world_size` incorrectly. ## …

DrJimFan updated 11 hours ago
30
ZiggyCreatures/FusionCache #111

[FEATURE] Backplane for SQL Server distributed cache

Currently the Backplane feature is only available for Redis cache. Would it be much work to get the same setup available when using SQL Server as the distributed cache?

RemarkLima updated 8 months ago
35
NVIDIA/nccl #573

NCCL WARN Could not find real path of...

When I try to run data parallel on single machine with 2 GPUs, the following error happened. ``` NCCL version 2.7.8+cuda11.0 xxxxx:2573:2612 [1] graph/xml.cc:332 NCCL WARN Could not find real pat…

ljz756245026 updated 7 months ago
21
pytorch/pytorch #97469

Improve collectives fingerprinting

### 🚀 The feature, motivation and pitch When using `TORCH_DISTRIBUTED_DEBUG=DETAIL` we collect collectives fingerprints and those are quite helpful when troubleshooting issues like stragglers. One…

kumpera updated 1 year ago
5

上一页 1...73 74 75 76 77 78 79...100 下一页

1000+ results for distributed-work

1000+ results
for distributed-work