axolotl-ai-cloud / axolotl

Go ahead and axolotl questions
https://axolotl-ai-cloud.github.io/axolotl/
Apache License 2.0
7.47k stars 806 forks source link

RunPod template not working with network volumes, /workspace/axolotl empty #813

Open Palmik opened 10 months ago

Palmik commented 10 months ago

Please check that this issue hasn't been reported before.

Expected Behavior

Other users also encountered this: https://github.com/OpenAccess-AI-Collective/axolotl/issues/467

According to the RunPod discord, one needs to explicitly rsync in the docker image, like here: https://github.com/runpod/containers/blob/main/official-templates/stable-diffusion-webui/pre_start.sh image

Current behaviour

The /workspace/axolotl directory is sometimes empty

Steps to reproduce

Use the official axolotl main-latets image on runpod.

Config yaml

No response

Possible solution

No response

Which Operating Systems are you using?

Python Version

N/A

axolotl branch-commit

N/A

Acknowledgements

mattflo commented 6 months ago

You can avoid this problem if you change the runpod template's persistent volume mount point from /workspace to /workspace/axolotl/data. It appears the runpod template doesn't live in the repository or I'd be happy to open a PR for the change. I chatted some with @winglian in discord and they have access to change the template. @winglian also suggested we also change the output_dir in the example yml files. I haven't tried axolotl with any other cloud GPU providers. Does changing the output_dir make sense on other cloud GPU providers? If so, I can open a PR for that change.

NanoCode012 commented 5 months ago

Just one caveat to this, an older issue wanted the HF models+datasets to be downloaded to the volume. If you change the above, the user should override these values:

https://github.com/OpenAccess-AI-Collective/axolotl/blob/89134f2143cd3325802813eb97cd05c783932201/docker/Dockerfile-cloud#L4-L7