Closed Nixellion closed 7 months ago
Can confirm, this bug is introduced with commit 641e6f7e
. It seems to work fine on 6dc68a6
.
I'm failing to checkout an older commit in runpod docker container:
git fetch
git checkout 6dc68a6
error: pathspec '6dc68a6' did not match any file(s) known to git
Any idea why this is?
EDIT: Had to do git fetch --unshallow
to fix it.
the val_set_size is too small I believe. increase it from 0.01 to 0.05
I'll try. Though as I said, this is in the example qlora.yml config. It should probably work out of the box.
Meanwhile, rolling back to an older commit 6dc68a6
worked. A single line fix:
pip install peft==0.6.0 && conda install pytorch torchvision torchaudio pytorch-cuda=11.8 -c pytorch -c nvidia && fit fetch --all && git fetch --unshallow && git checkout 6dc68a6 && echo "PATCHED"
Closing this due to stale. The potential issue should be validation dataset too small + sample_packing. Can fix by adjusting either.
Please let us know if this re-occurs.
Please check that this issue hasn't been reported before.
Expected Behavior
Qlora should train
Current behaviour
I get the following error:
Steps to reproduce
I'm using docker axolotl, on runpod (but same issue happens when using it on Windows, Docker Desktop).
To test that things work or not I'm trying to run one of the provided examples:
examples/mistral/qlora.yml
Out of the box it does not work at all, as discussed in this issue: https://github.com/OpenAccess-AI-Collective/axolotl/issues/835
After downgrading peft and reinstalling pytorch, however, I get a different error:
Config yaml
examples/mistral/qlora.yml
Possible solution
No response
Which Operating Systems are you using?
Python Version
3.10
axolotl branch-commit
runpod-main-latest
Acknowledgements