-
> If agentic translations can generate better results than traditional architectures (such as an end-to-end transformer that inputs a text and directly outputs a translation) -- which are often faster…
-
Training time ≈ (8* training tokens * model parameters) / (GPU counts) * GPU peek performance flops* "GPU utilization
Unfortunately, the actual time spent does not match this. Does anyone have the co…
-
### Description
There seems to be a problem with the layout of the `block` in the Introduction section.
![image](https://github.com/arXiv/html_feedback/assets/83172530/287b9310-69b7-4a89-be41-071ee9…
-
![js](https://github.com/TheProdigyLeague/Voyix/assets/30985576/c6fdf2b5-db86-4855-b7ab-13ea871ee27b)
# Prefix
Heavy workload from the machine learning components in Kubernetes. Consumer usage a…
-
Hello, is there a way to evaluate an LLM reranker after I finetune it on my own training dataset? Also, how should the test be structured? Same as the training data (e,.g. toy_finetune_data.jsonl)? Th…
-
probability tensor contains either `inf`, `nan` or element < 0
Training epoch 0: 1%|█▋ …
-
FP8 is very useful in training or inference in LLM. Does flash attention support FP8?
Thank you~
-
Some resources to consider including:
https://applied-llms.org/
DragonAI training
Ontogpt training
-
Hi :) really really interested in this topic, looking forward for documentation.
Thanks for sharing! 🙏
-
When I am trying to run the following finetuning command on GPU:
**nohup ../build/bin/finetune --model-base llama-3b-Q5_0.gguf --train-data "shakespeare.txt" --save-every 1 --adam-iter 2 --batch 4 --…