-
### 🐛 Describe the bug
The new API for checkpointing, e.g. `set_optimizer_state_dict`, calls `_init_optim_state`.
https://github.com/pytorch/pytorch/blob/2a4304329be0ee592af0f1d8d0dd9428ed82a0c6/t…
-
Hi, it's me again. The training is working great but when it comes to saving the checkpoint, I got this bug. Any ideas?
```
[rank0]: File "/workspace/train.py", line 230, in
[rank0]: train…
-
J'ai jeté un rapide coup d'oeil et je ne sais pas dans quelle mesure la vitesse est importante, mais une tactique simple de cache te ferait gagner pas mal de temps sur scalaskel. Tu n'es aussi pas obl…
-
### 🚀 The feature, motivation and pitch
`torch` ships with the `py.typed` marker, yet many modules lack a `__all__` declaration.
This leads to problems with type checkers:
```python
from torc…
-
run "python -m bitsandbytes"
Traceback (most recent call last):
File "/xxx/venv/lib/python3.10/site-packages/bitsandbytes/diagnostics/main.py", line 66, in main
sanity_check()
File "/xxx…
-
### 🐛 Describe the bug
I tried resuming training on a previous unsharded checkpoint from step 1k and the training resumed with no initial issue however when it tried to save the sharded checkpoint …
-
The following files do not have a run_tests mentioned in the file but are collected by test/run_tests.py, indicating that they probably aren't being run in CI.
```
distributed/_shard/test_sharder
…
-
Hi,
I have seen, that this issue has come up before, but since this was some time ago I thought it might be helpful to start another issue:
I am optimizing a function using the Optim package with u…
-
Hi!
Thanks for making this happen, it's a super useful resource!
I was wondering whether there is any reason to use bnb's 8-bit optimizers when doing Q-Lora optimization. Is it better to just us…
-
we use our data to finetune with pretrained dtu model, our data used the same format as DTU.
but we get some error when training
-- Process 0 terminated with the following error:
Traceback (mos…