-
### Reference code
- Llama-recipes code
[https://github.com/meta-llama/llama-recipes/tree/b7fd81c71239c67345d897c0eb6529eba076e8b8](https://github.com/meta-llama/llama-recipes/tree/b7fd81c71239c…
-
In transformers as a rule we load models always in as `float32` for stability, even if the weights are in `bfloat16`. As a result, loading `llama-3-8B` can't be done lazily via mmap, since we have to …
-
Hi, I am very happy to find a repo that can be used to fine-tune blip2 quickly
While using it (Llava instruction data), I ran into some issues.
I only have V100, but the model appears the following …
-
### Feature request
Let's add a new quantization method to LoRA, namely [optimum-quanto](https://github.com/huggingface/optimum-quanto).
There is some more context in [this diffusers issue](https:…
-
Hi All,
I'm trying to build the docker images in https://github.com/IntelAI/he-transformer/tree/master/contrib/docker on an Ubuntu 16.04 machine. When I run the command **make check_gcc**, it gives…
-
```python
epoch 1/200
/home/ubuntu/tools/sd-scripts/.venv/lib/python3.10/site-packages/torch/utils/checkpoint.py:31: UserWarning: None of the inputs have requires_grad=True. Gradients will be None
…
-
Thanks for releasing your great work. I was wondering if there is a way to run the finetuning and zero-shot inference code on GPU rather than TPU? What king of adjustment would I need to make?
Thanks
-
Hello, I have a problem about bandwidth when using GPU 0, 1 and GPU 6, 7. The bandwidth is different.
export CUDA_VISIBLE_DEVICES=0,1
./build/all_gather_perf -b 16M -e 1024M -i 16777216 -g 2 -d bfloa…
-
### Describe the issue
Hi,
I am using ipex to apply bf16 to the SpeechT5 model. I use both `ipex.optimize(model,dtype=torch.bfloat16)` and `with torch.cpu.amp.autocast(enabled=True, dtype=torch.…
-
Error occurred when executing PhotoMaker_Zho:
cutlassF: no kernel found to launch!
File "E:\ComfyUI\Blender_ComfyUI\ComfyUI\execution.py", line 155, in recursive_execute
output_data, outp…