Open disarmyouwitha opened 1 year ago
No, it's a bug
@KKcorps I see, thank you...
Since the adaptor files weren't written properly during checkpoints, I'm guessing that would require retraining after the fix? =x
if you have pytorch.bin files in the checkpoint dir then it won't but otherwise it might
I had some luck with my port of alpaca-lora to QLoRa. You can try it from https://github.com/vihangd/alpaca-qlora Though I have only tested on Open LLama 3b model
I started the training using:
It took 2.5 days but completed successfully, I checked the /output folder to find all of the checkpoint folders, but I don't think I have the final output (an adapter_model.bin around ~3gb)
Am I just being dumb? Thanks!