-
This is a small, initial list (feel free to suggest in the comments) of models that should be at least present IMHO:
- [x] stable-diffusion
- [ ] whisper
- [ ] wizardLM (no links, only configura…
-
Hi, thanks for making this project public.
I am trying to run training with fp16 and get the following error:
>RuntimeError: Input type (torch.cuda.HalfTensor) and weight type (torch.cuda.FloatTen…
-
### Describe the bug
Regardless of how many cores I used, I have 16 or 32 threads, map slows down to a crawl at around 80% done, lingers maybe until 97% extremely slowly and NEVER finishes the job. I…
-
Do you have plans for releasing trained models in Onnx format?
I tried to manually export the model `mpt-1b-redpajama-200b`to onnx format. Export generated lots of warnings like this:
```
Trac…
-
Hey,
thank you in advance for your great work and sharing the data :)
I read README and huggingface details and was unclear whether fuzzy deduplication is actually done on this dataset.
I underst…
-
I have installed the llm-rs library with the CUDA version, However, even though I have set `use_gpu=True` in the `SessionConfig`, the GPU is not utilized when running the code. Instead, the CPU usage …
-
Thanks for your awesome work! I noticed that there is a optimized weights called `configs/pile_doremi_r1_120M_ref:pile_baseline_50kvocab_nopack_120M.json` as shown in README. Can we consider this doma…
-
I'm trying to follow the instructions in https://github.com/mlfoundations/open_flamingo/issues/228 in order to run inference on a GPU.
My code looks as follows:
```
model, image_processor, tokeni…
-
**Description:**
Add MPT with Gradient Checkpointing and LoRa support into OpenThaiGPT pertaining code. We will use MPT with Lora for continue pertaining to task #179
**To Do:**
1. MPT Weight + MP…
-
1. How to get the mean value of massive activation?e.g. 2546.8/-1502.0 in hook.py
2. Mean value is still large, what is the difference between using the mean value and using the original value?