-
More context: https://github.com/kubeflow/training-operator/pull/2031#discussion_r1526533371.
Currently, we apply [HuggingFace Data Collator](https://huggingface.co/docs/transformers/en/main_classes/…
-
In the demos I’ve seen of Leon AI, it appeared rather slow. I have no idea if this was a limitation of the hardware or there were inefficiencies that might be improved upon. [GPT4All](https://github.c…
-
I am trying to finetune llama3-70B on trn132xlarge using distributed training. It failed with following error:
Container image: f"763104351884.dkr.ecr.{region}.amazonaws.com/pytorch-training-neur…
-
I follow here and use the same arguemnts:
https://epfllm.github.io/Megatron-LLM/guide/getting_started.html
When I training,
LOG_ARGS="--log_interval 1 --save_interval 100 --eval_interval 50"
…
-
I noticed the agent was working on the code and the file got bigger, and bigger, say 3k, then 6k, then 9k.
...Then at some point the file was 2 or 3k again. **Basically the agent just deleted everyth…
-
### System Info
- GPU:L40S
- Tensorrt-llm:0.11.0.dev2024060400
- cuda:cuda_12.4.r12.4/compiler.34097967_0
- driver:535.129.03
- os:DISTRIB_DESCRIPTION="Ubuntu 22.04.4 LTS"(docker)
-
### Who…
-
Self-supervised pre-training is the mechanism through which LLMs are pre-trained and also fine-tuned in absence of instruction datasets or human preferences, but when text documents are available.
…
-
Is there any successful story about LLM training using GGML?
-
ROCm/triton, ROCm/flash-attention or the fmha ck implementation?
-
Ollama - local models on your machine
https://youtu.be/Ox8hhpgrUi0?si=LxpAd1n29InncB78
Open-weight models
- Llama3
- Mistral 7B v0.3
Use cases:
- interactive vs non-intersecting
- local RAG…