-
Training stable diffusion XL unet using accelerate library with FSDP: fsdp_offload_params: true; fsdp_sharding_strategy: SHARD_GRAD_OP
Environment:
accelerate-0.34.2
torch-2.4.1
CUDA Version: 12…
-
Goal: Align implementation with GCP_OPT while retaining the cp_opt top level interface as faithfully as possible.
Components:
- [x] cp_opt.m
- [x] ktensor/fg.m
- [x] tt_opt_lbfgsb.m
- [ ] tt_op…
-
Retraining on checkpoint works perfectly with the tokenization on the fly, but breaks while using nanoset: training restart with a different lr, which is not the same as lr_schedule.pt
We also have…
-
The error message I get
Error in py_call_impl(callable, dots$args, dots$keywords) :
ValueError: decay is deprecated in the new Keras optimizer, pleasecheck the docstring for valid arguments, or u…
-
We can reproduce this problem using the following command: `torchrun --master_addr=127.0.0.1 --master_port=1234 --nnodes=1 --nproc-per-node=1 --node_rank=0 test_optimizer_state.py --sharding_type $SHA…
-
Hi,
i'm using tiny yolo v2 and im trying to use adam optimizer during training , so i added the following lines in cfg
![image](https://user-images.githubusercontent.com/33591581/65779761-abef6a80-e…
-
### 🐛 Describe the bug
In multiprocessing mode (i.e. FSDP/DDP), there occur JSONDecodeErrors within torch._inductor.triton_heuristics.cached_autotune, if the filesystem does not lock the file itself.…
-
**Describe the bug(问题描述)**
get a valueError when running deepfm demo
**To Reproduce(复现步骤)**
```python
# fail and get ValueError: Could not interpret optimizer identifier:
model.compile(optimiz…
-
### Issue type
Bug
### Have you reproduced the bug with TensorFlow Nightly?
Yes
### Source
binary
### TensorFlow version
2.16.1
### Custom code
Yes
### OS platform and distribution
WSL Ubun…
-
I have tried to implement multithreaded sampling by changing:
```julia
function estimate_energy_with_samples(prob, samples)
#return mean(Base.Fix1(LogDensityProblems.logdensity, prob), eachsa…