Closed Chris113113 closed 6 months ago
Mostly LGTM. In addition to my comments, two things:
- From this PR it looks like we are adjusting this example to run Llama-2-13B instead of Llama-2-70B. Just want to double check this is intentional
- Could you attach some link (using short-gen) where you are able to run this workload?
Good catch on 13B, I was using it to experiment.
http://shortn/_klGl5LKuQm, added to description.
The PR's primary purpose is updating lit-gpt's commit to a PyTorch 2.2 commit. This also comes with a few other things:
Logs from new image: http://shortn/_klGl5LKuQm
flash_attn
resulting in more reliable lit-gpt image builds.