-
Thank you for your great work!
In the stage 1 training mentioned in the paper, is the input of llm images and text,because the description ‘After the pretraining stage, the model is capable of genera…
-
![微信图片_20240614110705](https://github.com/TinyLLaVA/TinyLLaVA_Factory/assets/138667911/2006b591-3bda-4bfe-882e-4710dc9d02b7)
![微信图片_20240614110705](https://github.com/TinyLLaVA/TinyLLaVA_Factory/asse…
-
The idea of a data exchange platform for regulatory complaints and issues, integrated with an LLM, relates to financial institutions in several significant ways:
### 1. **Regulatory Compliance Manage…
-
https://universitetslararen.se/2023/10/30/chat-gpt-hjalp-eller-hinder/
-
### Model description
Here is the model description
> gte-Qwen1.5-7B-instruct is the latest addition to the gte embedding family. This model has been engineered starting from the [Qwen1.5-7B](https:…
-
According to original [discord message](https://discord.com/channels/902229215993282581/913488649734213672/1242600853882535946)
Hello everyone! I am fine-tuning a model in a non-English language. T…
-
Specs: rtx 3060ti w/ 8gb vram, r7 5700x, 32gb ram
main says
`main: build = 2769 (8843a98c)
main: built with cc (Debian 12.2.0-14) 12.2.0 for x86_64-linux-gnu`
make says
`GNU Make 4.3
Built…
-
### 🚀 The feature, motivation and pitch
there's a new DP shard strategy which is more flexible and general, see more detail at https://arxiv.org/abs/2311.00257 AMSP: Reducing Communication Overhead o…
-
Hi Team,
It is amazing handbook. In the continued pre-training script (`run_cpt.py`), I saw that it is not using "mlm" (Masked Language Model) parameter in the training process. I though that the …
-