-
Apple Feedback Assistant ID: FB15619021
---
**Is your feature request related to a problem? Please describe.**
When rendering, the CPU only take 200-250% when i have 8 remaining performance cores…
-
I try to fine-tune `lmsys/vicuna-7b-v1.3` model.
I have a server with 8 NVIDIA RTX A4500 (20Gb), so in total, about 160Gb of GPU Memory.
When I try to train with `mem` I have OOM in the middle o…
-
Datadog does not automatically connect the event bus's producer and consumer traces. If we want this sort of distributed tracing, we'll need to add it ourselves.
## Implementation notes
[Datadog Sup…
-
### Describe the Bug
Scene shape is:
300, 4, 41, 2048, 2048
. I call
```
ti = im.get_image_dask_data("TCZYX", T=slice(start_t, end_t)) # start_t = 0, end_t = 4
ti = ti.persist()
```
A…
-
training with fp16 doesn't work for me on a RTX3060, I'll look into fixing it, but for future reference here is the full stacktrace
torch version 1.9.0
INFO:torch.distributed.distributed_c10d:Adde…
-
I want to utilize the `VanillaDataManager`(nerfstudio-0.2.2) as a component of my work. Now single gpu training works well, and im try to make it in multi-gpu for parallel training.
However, im not s…
-
**Describe the issue**:
`LocalCluster` objects don't seem to respect resources configured via environment variables, or via `dask.config`.
**Minimal Complete Verifiable Example**:
Explicitly …
-
In Makefile, we have:
```
OS := $(shell uname)
[..]
ifeq ($(OS), Darwin)
export CPU_COUNT=$(shell sysctl -n hw.logicalcpu || echo 1)
else
export CPU_COUNT=$(shell nproc || echo 1)
endi…
-
Hello,
I am currently having a jax program `(p)jitted` and running on 8 devices. I want to scale it up to 32 devices by running 4 replicas of this program (and only do a `lax.pmean` at the end of e…
luyug updated
2 weeks ago
-
### Description & Motivation
In the Tensor Parallel example [here](https://github.com/Lightning-AI/pytorch-lightning/tree/master/examples/fabric/tensor_parallel) while the proposed model is a Llama…