-
I reinstall `pip install flash-attn==2.6.1` in NGC pytorch docker image 24.06.
When I run train job, I got follow error:
```
Traceback (most recent call last):
File "/data1/nfs15/nfs/bigdata/zha…
-
The current repository provides comparisons of various AI language models. However, there are several recent models and unique architectures that are not included in the existing comparisons. Expandin…
-
### Feature request
Is there a possibility to add training on bigger model logits
It's a question of training on logits instead of one-hot vectors from dataset text
### Motivation
DistillKit slows…
-
Hey folks so here is my recommended gameplan for our goal to be able to take arbitrarily formatted scripts and convert them all to the same format of our choosing. We will begin in integrate our solu…
-
### Community Note
* Please vote on this issue by adding a 👍 [reaction](https://blog.github.com/2016-03-10-add-reactions-to-pull-requests-issues-and-comments/) to the original issue to help the commu…
-
I am attempting to use the fine tuning with my custom dataset, however the training percentage value keeps staying at 0% and not increasing at all, after 20h of running time:
```
Train: 0%| …
-
**Describe the bug**
OpenAI API endpoint is "/v1/chat/completions", but OVMS endpoint is "/v3/chat/completions".
most of existing application doesn't allow user to modify the prefix “**V1**” to "**…
-
I'm using longhorn v1.6.0
and I create the volume with replica 2
I am training an AI model by reading image files from a Longhorn volume, and recently, the training often hangs unexpectedly.
I …
-
I was trying to finetuning Meta-Llama-3-8B-Instruct using 4 gpus with the following command:
`torchrun --nproc_per_node 4 -m training.run --output_dir llama3test --model_name_or_path meta-llama/Met…
-
These comments on the sections related to black box models made on the version which was live on Friday 8 November 2024. The sections outlined below are what was covered.
**Definitions**
- [ ] A…