-
Machines
- dual 4090 ada
- dual A4500
- single A6000
- single A4000
- single 3500 Ada
Concentrate on A6000 and A4000 with 10gbps networking
- https://www.tensorflow.org/guide/distributed_trai…
-
Mine is 18.04 as well, but it didn't work for me. I tried the #3 with creating the 99fixbadproxy file and
copying the pull request in #3 but it still gave me the following error:
E: Failed to fetch…
-
It seems like the URL [http://archive.lambdalabs.com/ubuntu ](http://archive.lambdalabs.com/ubuntu ) is not valid as a source anymore? Unless I'm missing something obvious.
I get the following erro…
-
accelerate launch --config_file 1gpu.yaml test_mvdiffusion_seq.py --config ./configs/mvdiffusion-joint-ortho-6views.yaml
/home/gpu/data/Wonder3D/wonder3d/lib/python3.10/site-packages/numba/np/ufunc/p…
-
Code to reproduce error:
```
from diffusers import StableDiffusionImageVariationPipeline
from PIL import Image
device = "cuda:0"
sd_pipe = StableDiffusionImageVariationPipeline.from_pretraine…
-
## Problem
Hello, I'm getting this weird cublasLt error on a lambdalabs H100 with cuda 118, pytorch 2.0.1, python3.10 Miniconda while trying to fine-tune a 3B param open-llama using LORA with 8bit …
-
We're testing finetuning on an h100 and 4090, here are the results:
4090: https://voca.ro/11mtxzLHzzih
h100: https://voca.ro/15QldVjuG7nu
Almost identical finetune, but h100 is output is SIGNIF…
-
Hi,
I have 70 M training samples and 1 M validation samples. Test loss is reducing and accuracy has reached 0.83 but never exceeded 0.83. Now the number of epochs is 55, so should I wait or it will…
-
### Is there an existing issue for this?
- [X] I have searched the existing issues and checked the recent builds/commits
### What would your feature do ?
As CLIP models map both images and te…
-
I've spun up a gpu_1x_a10 instance on LambdaLabs and followed the instructions in the README (note: espeak package was not found). When trying to run `sudo cog run --debug exec` I get the following
…